Skip to content

A General-Purpose UI for Csound on Android

Spurred on by some friends, I went on to put together a simple UI for playing with Csound on Android. Basically, I put together five sliders, five press buttons and a trackpad, plus accelerometer. This simple app can run CSD files that can tap the values in the controls, via the chnget opcode and named channels. It has a browse button that lets users load in a CSD from the filesystem.

the csdplayer UI

A few user instructions are in order:

1. controls are all 0-1 except accelerometer that has much narrower range and has to be amplified by around 1000 to give significant range.

2. channel names are as in the UI (slider1 etc.). Buttons have 2 extra channels “channel_name.x” and “channel_name.y” with the touch coordinates;

trackpad has 2 also: trackpad.x and trackpad.y

3. buttons are press-down rather than trigger: 1 down 0 otherwise. Their x and y will also be 0 when there is no touch (idem with trackpad)

4. accelerometer channel names are: “accelerometerX”, “accelerometerY” and “accelerometerZ”

5. On my tab the UI does not allow 2 controls to be adjusted simultaneously (I think this might be an Android limitation), so buttons etc

are “monophonic”. Trackpad is monophonic, but this might change in the next version, once I work out a channel naming scheme.

6. The on/off switch will go off at the end of the performance, if you reach it.

A very basic CSD showing an example of how to access controls is shown below. Users can supply their own Csound code to run with the app.

<CsoundSynthesizer>
<CsOptions>
-odac -+rtaudio=null -d
</CsOptions>
<CsInstruments>
nchnls = 1

instr 1

idur = p3
iamp = p4*0dbfs
icps = p5
iatt = p6
idec = p7
itab = p8
k3 init 1

k1 chnget "slider1"
k1 port k1+1, 0.01, 1
a1 expseg 1,idur-idec,1,idec, 0.001
a2 linen a1, iatt, idur, 0.005
a3 oscili a2*iamp, icps*k1, itab
k3 chnget "trackpad.x"
k3 portk k3, 0.01
out a3*k3

chnmix a3*k3, "reverb"
endin

instr 100

asig chnget "reverb"
a1,a2 freeverb asig,asig, 0.8, 0.7
out (a1+a2)*0.1

chnclear "reverb"
endin

</CsInstruments>
<CsScore>

; example function table
f 1 0 16384 10 1

; example event
; ins st end amp cps att dec tab
i 1 0 3600 0.5 440 0.01 .9 1
i 100 0 -1
 </CsScore>
</CsoundSynthesizer>

Finally, here is the link to the app:


CsoundApp.apk

The source code and Eclipse project can be found in csound5 git:

git://csound.git.sourceforge.net/gitroot/csound/csound

Advertisements

A simple delay effect for Android

Some of you might wonder how to use the code in my previous post to build processing applications. In this note, I will show you how. In fact, I have put in the work in designing the OpenSL module so it can be re-used. For this, you don’t have to touch it, only add it to your project.

You can follow the instructions on how to put the NDK project from the previous blog. Or you can just copy it and start modifying the code. What we need to do is to edit the opensl_example.c and opensl_example.h files. To this we will be adding some processing capabilities.

We will like to do an echo effect, based on a comb filter. So let’s build the infrastructure for it. We will create a simple class in C, based on a data structure (its data members) and some functions to manipulate it (its methods). Here it is, we can add it to the top of the C source file:

typedef struct delayline_t {
 float *delay; // delayline
 int size;     //  length  in samples
 int rp;       // read pointer
 float fdb;    // feedback amount
} DELAYLINE;

DELAYLINE *delayline_create(float delay, float fdb) {
 DELAYLINE *p = (DELAYLINE *)calloc(sizeof(DELAYLINE), 1);
 p->size = delay*SR;
 p->delay = calloc(sizeof(float), p->size);
 p->fdb = fdb > 0.f ? (fdb < 1.f ? fdb : 0.999999f) : 0.f ;
 return p;
}
void delayline_process(DELAYLINE *p,float *buffer,
                                           int size) {
 // process the delay, replacing the buffer
 float out, *delay = p->delay, fdb = p->fdb;
 int i, dsize = p->size, *rp = &(p->rp);
 for(i = 0; i < size; i++){
 out = delay[*rp];
 p->delay[(*rp)++] = buffer[i] + out*fdb;
 if(*rp == dsize) *rp = 0;
 buffer[i] = out;
 }
}

void *delayline_destroy(DELAYLINE *p){
 // free memory
if(p->delay != NULL) free(p);
if(p != NULL) free(p);
}

With this in place, all we need is to modify our main process to use the delayline. We will add a couple of arguments to the main processing function (and rename it main_process from start_process), so that from the Java app code, we can set the delay time and feedback amount:

void main_process(float delay, float fdb) {
 OPENSL_STREAM *p;
 int samps, i;
 float buffer[VECSAMPS];
 DELAYLINE *d;
 p = android_OpenAudioDevice(SR,1,1,BUFFERFRAMES);
 if(p == NULL) return;
 d = delayline_create(delay, fdb);
 if(d == NULL) {
  android_CloseAudioDevice(p);
  return;
 }

 on = 1;
 while(on) {
   samps = android_AudioIn(p,buffer,VECSAMPS);
   delayline_process(d,buffer,samps);
   android_AudioOut(p,buffer,samps);
 }
 android_CloseAudioDevice(p);
}

Now, the only two things that are left to do are: edit the opensl_example.h header file and substitute the start_process() by the main_process(float delay, float fdb) prototype,

#ifdef __cplusplus
extern "C" {
#endif
 void main_process(float delay, float fdb);
 void stop_process();
#ifdef __cplusplus
};
#endif

and modify the Java code to call this new function

t = new Thread() {
 public void run() {
 setPriority(Thread.MAX_PRIORITY);
 opensl_example.main_process(0.5f, 0.7f);
 }
 };

and you should be good to go!

Android audio streaming with OpenSL ES and the NDK.

Audio streaming in Android is a topic that has not been covered in general in the Android documentation or in programming examples. To cover that gap, I will like to discuss the use of the OpenSL ES API through the Android Native Development Kit (NDK). For those of you who are new to Android programming, it is important to explain a little bit how the various components of the development system work.

First we have the top-level application programming environment, the Android SDK, which is Java based. This supports audio streaming via the AudioTrack API, which is part of the SDK. There are various examples of AudioTrack applications around, including the pd-android and Supercollider for Android projects.

In addition to the SDK, Android also provides a slightly lower-level programming environment, called the NDK, which allows developers to write C or C++ code that can be used in the Application via the Java Native Interface (JNI).  Since Android 2.3, the NDK includes the OpenSL ES API, which has not been used as widely at the time of writing. One project currently employing it is Csound for Android. This note discusses the use of the OpenSL API and the NDK environment for the development of audio streaming apps.

Setting up the development environment

For this, you will need to go to the Google Android development site and download all the tools. These include the SDK, the NDK and the eclipse plugin. You will also need to get the Eclipse IDE, the ‘classic’ version is probably the most suitable for this work. Instructions for installing these packages are very clear and there is plenty of information on the internet to help you if things go wrong.

Another useful tool for Android development is SWIG, which is used to create the Java code to wrap the C functions we will write. It is not completely required, because you can use the JNI directly. However, it is very handy, as the JNI is not the easiest piece of development software around (some would call it ‘a nightmare’). SWIG wraps C code very well and it simplifies the process immensely. We will use it in the example discussed here.

An example project

The example project we will be discussing can be obtained via git with the following command:

$git clone https://bitbucket.org/victorlazzarini/android-audiotest

Alternatively, these sources can be obtained from the same location as an archive, via the web page interface.

The project consists of a NDK project for the OpenSL streaming IO module and an Eclipse project for the application example. The NDK project is first built by running the top-level script

$sh build.sh

This simple script first sets up the location of the downloaded NDK (you will need to set this to match your system locations)

export ANDROID_NDK_ROOT=$HOME/work/android-ndk-r7

and then proceeds to call SWIG to build the Java interface code that will link our C opensl example module to the app. It creates both a C++ file wrapping the C code and the Java classes we need to use to run it.

swig -java -package opensl_example -includeall -verbose
-outdir src/opensl_example -c++ -I/usr/local/include
-I/System/Library/Frameworks/JavaVM.framework/Headers
-I./jni -o jni/java_interface_wrap.cpp opensl_example_interface.i

When this is done, it calls the NDK build script,

$ANDROID_NDK_ROOT/ndk-build TARGET_PLATFORM=android-9 V=1

that will build a dynamically-loadable module (.so) containing our native code. This script is hardwired to use the Android.mk file in the ./jni directory.

Once the NDK part is built, we can turn to Eclipse. After starting it, we should import the project by using File->Import and then the ‘Import into existing workspace’ option. It will ask for the project directory and we just browse and select the top-level one (android-audiotest). If everything proceeded to plan, you can plug in your device and choose build (Android app). The application will be built and run in the device. At this point you will be able to talk into the mic and hear your voice over the speakers (or, more appropriately, a pair of headphones).

The native interface code

Two source files compose the native part of this project: opensl_io.c, which has the all the audio streaming functions; and opensl_example.c, which uses these to implement the simple audio processing example. A reference for the OpenSL API is found in the OpenSL ES 1.0.1 specification, which is also distributed in the Android NDK docs/opensl directory. There we find some specific documentation on the Android implementation of the API, which is also available online.

Opening the device for audio output

The entry point into OpenSL is through the creation of the audio engine, as in

result = slCreateEngine(&(p->engineObject), 0, NULL, 0, NULL, NULL);

This initialises an  engine object of the type SLObjectItf  (which in the example above is held in a data structure pointed by p). Once an engine is created, it needs to be realised (this is going to be a common process with OpenSL objects, creation followed by realisation). An engine interface is then obtained, which will be used subsequently to open and initialise the input and output devices (with their sources and sinks):

result = (*p->engineObject)->Realize(p->engineObject, SL_BOOLEAN_FALSE);
...
result = (*p->engineObject)->GetInterface(p->engineObject,
                                     SL_IID_ENGINE, &(p->engineEngine));

Once the interface to the engine object is obtained, we can use it to create other API objects. In general, for all API objects, we:

  1. create the object (instantiation)
  2. realise it (initialisation)
  3. obtain an interface to it (to access any features needed), via the GetInterface() method

In the case of playback, the first object to be created is the Output Mix (also a SLObjectItf), and then realised:

const SLInterfaceID ids[] = {SL_IID_VOLUME};
const SLboolean req[] = {SL_BOOLEAN_FALSE}
result = (*p->engineEngine)->CreateOutputMix(p->engineEngine,
                                    &(p->outputMixObject), 1, ids, req);
...
result = (*p->outputMixObject)->Realize(p->outputMixObject,
                                                 SL_BOOLEAN_FALSE);

As we will not need to manipulate it, we do not need to get its interface. Now, we configure the source and sink of a player object we will need to create. For output, the source is going to be a buffer queue, which is where we will send our audio data samples. We configure it with the usual parameters: data format, channels, sampling rate (sr), etc:

SLDataLocator_AndroidSimpleBufferQueue loc_bufq =
                           {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2};
SLDataFormat_PCM format_pcm = {SL_DATAFORMAT_PCM,channels,sr,
               SL_PCMSAMPLEFORMAT_FIXED_16, SL_PCMSAMPLEFORMAT_FIXED_16,
               speakers, SL_BYTEORDER_LITTLEENDIAN};
SLDataSource audioSrc = {&loc_bufq, &format_pcm};

and the sink the Output Mix, we created above,

SLDataLocator_OutputMix loc_outmix = {SL_DATALOCATOR_OUTPUTMIX,
                                                p->outputMixObject};
SLDataSink audioSnk = {&loc_outmix, NULL};

The audio player object then gets created with this source and sink, and realised:

const SLInterfaceID ids1[] = {SL_IID_ANDROIDSIMPLEBUFFERQUEUE};
const SLboolean req1[] = {SL_BOOLEAN_TRUE};
result = (*p->engineEngine)->CreateAudioPlayer(p->engineEngine,
                    &(p->bqPlayerObject), &audioSrc, &audioSnk,
                     1, ids1, req1);
...
result = (*p->bqPlayerObject)->Realize(p->bqPlayerObject, 
                                             SL_BOOLEAN_FALSE)

Then we get the player object interface,

result = (*p->bqPlayerObject)->GetInterface(p->bqPlayerObject, 
                                 SL_IID_PLAY,&(p->bqPlayerPlay));

and the buffer queue interface (of type SLBufferQueueItf)

result = (*p->bqPlayerObject)->GetInterface(p->bqPlayerObject,
       SL_IID_ANDROIDSIMPLEBUFFERQUEUE, &(p->bqPlayerBufferQueue));

The OpenSL API provides a callback mechanism for audio IO. However, unlike other asynchronous audio IO implementations, like in CoreAudio or Jack, the callback does not pass the audio buffers for processing, as one of its arguments. Instead, the callback is only use to signal the application, indicating that the buffer queue is ready to receive data.

The buffer queue interface obtained above will be used to set up a callback (bqPlayerCallback, which is passed p as context):

result = (*p->bqPlayerBufferQueue)->RegisterCallback(
                      p->bqPlayerBufferQueue,bqPlayerCallback, p);

Finally, the player interface is used to start audio playback:

result = (*p->bqPlayerPlay)->SetPlayState(p->bqPlayerPlay,
                                            SL_PLAYSTATE_PLAYING);

Opening the device for audio input

The process of starting the recording of audio data is very similar to playback. First we set our source and sink, which will be the Audio Input and a buffer queue, respectively:

SLDataLocator_IODevice loc_dev = {SL_DATALOCATOR_IODEVICE,
                      SL_IODEVICE_AUDIOINPUT,
                      SL_DEFAULTDEVICEID_AUDIOINPUT, NULL};
SLDataSource audioSrc = {&loc_dev, NULL};
...
SLDataLocator_AndroidSimpleBufferQueue loc_bq =
                      {SL_DATALOCATOR_ANDROIDSIMPLEBUFFERQUEUE, 2};
SLDataFormat_PCM format_pcm = {SL_DATAFORMAT_PCM, channels, sr,
          SL_PCMSAMPLEFORMAT_FIXED_16, SL_PCMSAMPLEFORMAT_FIXED_16,
          speakers, SL_BYTEORDER_LITTLEENDIAN};
SLDataSink audioSnk = {&loc_bq, &format_pcm};

Then we create an audio recorder, realize it and get its interface:

const SLInterfaceID id[1] = {SL_IID_ANDROIDSIMPLEBUFFERQUEUE};
const SLboolean req[1] = {SL_BOOLEAN_TRUE};
result = (*p->engineEngine)->CreateAudioRecorder(p->engineEngine,
                              &(p->recorderObject), &audioSrc,
                               &audioSnk, 1, id, req);
...
result = (*p->recorderObject)->Realize(p->recorderObject,
                                          SL_BOOLEAN_FALSE);
...
result = (*p->recorderObject)->GetInterface(p->recorderObject,
                           SL_IID_RECORD, &(p->recorderRecord));

The buffer queue interface is obtained and the callback set:

result = (*p->recorderObject)->GetInterface(p->recorderObject,
     SL_IID_ANDROIDSIMPLEBUFFERQUEUE, &(p->recorderBufferQueue));
...
result = (*p->recorderBufferQueue)->RegisterCallback(
                   p->recorderBufferQueue, bqRecorderCallback,p);

We can now start audio recording:

result = (*p->recorderRecord)->SetRecordState(
                      p->recorderRecord,SL_RECORDSTATE_RECORDING);

Audio IO

Streaming audio to/from the device is done by the Enqueue() method of SLBufferQueueItf:

SLresult (*Enqueue) (SLBufferQueueItf self,
                     const void *pBuffer, SLuint32 size);

This should be called whenever the buffer queue is ready for a new data buffer (either for input or output). As soon as the player or recorder object is set into playing or recording state, the buffer queue will be ready for data. After this, the callback mechanism will be responsible for signaling the application that the buffer queue is ready for another block of data. We can call the Enqueue() method in the callback itself, or elsewhere. If we decide for the former, to get the callback mechanism running, we need to enqueue a buffer as we start recording or playing, otherwise the callback will never be called.

An alternative is to use the callback only to notify the application, waiting for it as we have a full buffer to deliver. In this case we would employ a double buffer, so that while one half is enqueued, the other is getting filled or consumed by our application. This allows us to create a simple interface that can receive a block of audio that will be used to write to the buffer or to receive the samples from the buffer.

Here is what we do for input. The callback is very minimal, it just notifies our main processing thread that the buffer queue is ready:

void bqRecorderCallback(SLAndroidSimpleBufferQueueItf bq, void *context)
{
  OPENSL_STREAM *p = (OPENSL_STREAM *) context;
  notifyThreadLock(p->inlock);
}

Meanwhile, a processing loop would call the audio input function to get a block of samples. When the buffer is emptied, we wait for the notification to enqueue a buffer to be filled by the device and switch buffers:

int android_AudioIn(OPENSL_STREAM *p,float *buffer,int size){
  short *inBuffer;
  int i, bufsamps = p->inBufSamples, index = p->currentInputIndex;
  if(p == NULL || bufsamps ==  0) return 0;

  inBuffer = p->inputBuffer[p->currentInputBuffer];
  for(i=0; i < size; i++){
    if (index >= bufsamps) {
      waitThreadLock(p->inlock);
      (*p->recorderBufferQueue)->Enqueue(p->recorderBufferQueue,
                     inBuffer,bufsamps*sizeof(short));
      p->currentInputBuffer = (p->currentInputBuffer ? 0 : 1);
      index = 0;
      inBuffer = p->inputBuffer[p->currentInputBuffer];
    }
    buffer[i] = (float) inBuffer[index++]*CONVMYFLT;
  }
  p->currentInputIndex = index;
  if(p->outchannels == 0) p->time += (double) size/(p->sr*p->inchannels);
  return i;
}

For output, we do the reverse. The callback is exactly the same, but now it notifies that the device has consumed our buffer. So in the processing loop, we call this function that fills the output buffer with the blocks we pass to it. When the buffer is full, we wait for the notification so that we can enqueue the data and switch buffers:

int android_AudioOut(OPENSL_STREAM *p, float *buffer,int size){

short *outBuffer, *inBuffer;
int i, bufsamps = p->outBufSamples, index = p->currentOutputIndex;
if(p == NULL  || bufsamps ==  0)  return 0;
outBuffer = p->outputBuffer[p->currentOutputBuffer];

for(i=0; i < size; i++){
outBuffer[index++] = (short) (buffer[i]*CONV16BIT);
if (index >= p->outBufSamples) {
waitThreadLock(p->outlock);
(*p->bqPlayerBufferQueue)->Enqueue(p->bqPlayerBufferQueue,
outBuffer,bufsamps*sizeof(short));
p->currentOutputBuffer = (p->currentOutputBuffer ?  0 : 1);
index = 0;
outBuffer = p->outputBuffer[p->currentOutputBuffer];
}
}
p->currentOutputIndex = index;
p->time += (double) size/(p->sr*p->outchannels);
return i;
}

The interface

The code discussed above is structured into a minimal API for audio streaming with OpenSL. It contains five functions (and one opaque data structure):

/*
  Open the audio device with a given sampling rate (sr), input and
  output channels and IO buffer size in frames.
  Returns a handle to the OpenSL stream
*/
OPENSL_STREAM* android_OpenAudioDevice(int sr, int inchannels,
                                int outchannels, int bufferframes);
/*
Close the audio device
*/
void android_CloseAudioDevice(OPENSL_STREAM *p);
/*
Read a buffer from the OpenSL stream *p, of size samples.
Returns the number of samples read.
*/
int android_AudioIn(OPENSL_STREAM *p, float *buffer,int size);
/*
Write a buffer to the OpenSL stream *p, of size samples.
Returns the number of samples written.
*/
int android_AudioOut(OPENSL_STREAM *p, float *buffer,int size);
/*
Get the current IO block time in seconds
*/
double android_GetTimestamp(OPENSL_STREAM *p);

Processing

The example is completed by a trivial processing function, start_processing(), which we will wrap in Java so that it can be called by the application. It employs the API described above:

p = android_OpenAudioDevice(SR,1,2,BUFFERFRAMES);
...
while(on) {
   samps = android_AudioIn(p,inbuffer,VECSAMPS_MONO);
   for(i = 0, j=0; i < samps; i++, j+=2)
     outbuffer[j] = outbuffer[j+1] = inbuffer[i];
   android_AudioOut(p,outbuffer,VECSAMPS_STEREO);
  }
android_CloseAudioDevice(p);

A stop_processing() function is also supplied, so that we can stop the streaming to close the application.

The application code

Finally, completing the project, we have a small Java class, which is based on the Eclipse auto- generated Java application code, with the addition of a secondary thread and calls to the two wrapped native functions described above:

public class AudiotestActivity extends Activity {
    /** Called when the activity is first created. */
    Thread thread;
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);
        thread = new Thread() {
            public void run() {
                setPriority(Thread.MAX_PRIORITY);
                opensl_example.start_process();
            }
        };
        thread.start();   
    }
    public void onDestroy(){
        super.onDestroy();
        opensl_example.stop_process();
        try {
            thread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        thread = null;
    }
}

Final Words

I hope these notes have been useful to shed some light on the development of audio processing applications using native code and OpenSL. While not offering (yet) a low-latency option to the AudioTrack SDK, it can still possibly deliver better performance, as native code is not subject to  virtual machine overheads such as garbage collection pauses. We assume that this is the way forward for audio development on Android. In the NDK document on the Android OpenSL implementation, we read that

as the Android platform and specific device implementations continue to evolve, an OpenSL ES application can expect to benefit from any future system performance improvements.

This indicates that Android audio developers should perhaps take some notice of OpenSL. The lack of examples has not been very encouraging, but I hope this blog is a step towards changing this.

Faust adventures

Lately, I have been looking into faust – functional audio stream, an interesting language for DSP developed by the guys at GRAME, mainly Yann Orlarey and Stéphane Letz. This is a purely functional language for writing programs that can be compiled into C++ code for various platforms (PD, Csound, SC3 plugins; standalone applications; LLVM; mobile apps; etc). It has also a nice feature which lets you compile Mathematical documents from it to describe the DSP process implemented in the program.

The main thing about this is that we have to get into the functional mode of thinking in order to use and appreciate the system. For instance, typical imperative code will not be directly translatable to faust. Here are some basic elements of the language:

1) The entry point in faust is a ‘process’:

process =  ...

2) There are a basic set of language primitives: arithmetic and comparison operators (including bitwise operations); means of foreign (C++) function, variable and constant access; identity and cut ‘boxes’ (for passing or stopping signals); memory (single-sample) and delay operators; tables (read-only and read-write); selection switch; GUI components (slider, button, number box, etc.).

3) Programs combine these operations using one of five compositions: serial (:); parallel (,); split (<: merge=””>); recursion (~).</:>

and that’s it (there are some further language elements, but with these we can already do a lot).

Now the complication starts. Let’s program a simple sinewave signal. The first thing we need to do is to get a way to represent time in samples. In an imperative way, we could do this with a loop like this

for(i=0; i < end; i++) ...

Well, there are no loop primitives here. So what can we do? We use recursion. Here’s the time passing in faust:

time = (+(1) ~ _ )  - 1;

The recursion operator ~ combines a signal identity ( _ ), which can be understood as the output of the process, and the function +(1), which adds one to it. The recursive composition implies a 1-sample delay as the signal is fed back. So (+(1) ~ _ ) gives us a sequence 1,2,3,… . The subtraction by 1 is to bring the start of the sequence to 0, as we normally treat time sequences as 0-based.

Now, with time in hand, we can generate a sinewave signal. We need a sine function, which we can obtain as a C++ function using a primitive:

sine  =  ffunction(float sinf (float), <math.h>, "");</math.h>

Similarly we can define PI and SR constants

PI          = 3.1415926535897932385;
SR         = 44100.;

With this, we have a faust program that produces a 440Hz sinewave:

process =    time : *(2*PI/SR) : *(440) : sine;

You can see the (extreme) use of the serial composition operator. Start with the sample counter, multiply it by 2*PI/SR so the signal now goes from 0 to 2*PI in 44100 samples, then multiply it by 440Hz to make the signal go to multiples of 2*PI 440 times every 44100 samples. This signal then is sent to the sine function.

For those of you who find this too painful, faust allows a more usual function notation as syntatic sugar (not allowed for diabetics):

process =  sine(2*PI*440*time/SR);

So there you go, Faust. I will hopefully bring some further examples to the mix in later posts.

Delayline Pitchshifter

his note discusses a delayline pitchshifter that can be used to transpose sounds with small amounts of shift and is quite effective. It is based on a circular delay line and two variable taps that read the buffer at different ‘speeds’ depending on the amount of pitchshift required.  For better results on monophonic pitched sounds, if we know the original pitch, we can set the delayline to match the fundamental period and this will reduce the artifacts introduced by the shift.

Again, we’ll be using Python and its scientific libraries, which provide a nice environment for studying these processes:

from pylab import *
from scipy.io import wavfile

The whole process is placed inside a function that takes in some input parameters, signal arrays for in and out; pitch; and delay time in samples. It produces an output signal array. This function sets a delay line with two variable taps, and uses a triangular (Bartlett) window to fade between the taps:

def pitchshifter(sigin,sigout,pitch,deltime):

size = deltime       # delay time in samples
delay = zeros(size) # delay line
env = bartlett(size)   # fade envelope table
tap1 = 0            # tap positions
tap2 = size/2

wp = 0              # write pos

The processing code then loops over the input signal, filling the delay and then reading it using two taps:

 for i in range(0, len(sigin)):

delay = sigin[i]   # fill the delay line

# first tap, linear interp readout
frac = tap1 – int(tap1)
if tap1 < size – 1 : delaynext = delay[tap1+1]
else: delaynext = delay[0]
sig1  =  delay[int(tap1)] + frac*(delaynext – delay[int(tap1)])

# second tap, linear interp readout
frac = tap2 – int(tap2)
if tap2  < size – 1 : delaynext = delay[tap2+1]
else: delaynext = delay[0]
sig2  =  delay[int(tap2)] + frac*(delaynext – delay[int(tap2)])

The tap signals are faded using the triangular window. The position of the fade envelopes is based on the difference between the tap and the delay write position (the envelope should be 0 at this position and max when the tap is furthest away from the write position). This envelope is crucial to avoid the discontinuity that happens when the taps ‘overtake or are overtaken by’ the write pointer.

    # fade envelope positions

ep1 = tap1 – wp
if ep1 < 0: ep1 += size
ep2 = tap2 – wp
if ep2 < 0: ep2 += size

# combine tap signals

sigout[i] = env[ep1]*sig1 + env[ep2]*sig2

Finally we increment the taps by the required pitch transposition ratio, and wrap around the circular delay buffer. At the end we also increment the writer pointer by 1 position:

    # increment tap pos according to pitch transposition

tap1 += pitch
tap2 = tap1 + size/2

# keep tap pos within the delay memory bounds

while tap1 >= size: tap1 -= size
while tap1 < 0: tap1 += size

while tap2 >= size: tap2 -= size
while tap2 < 0: tap2 += size

# increment write pos

wp += 1
if wp == size: wp = 0

The function returns the output signal array, now filled with the transposed audio.

 return sigout

A short harmoniser program can be written like this. We take an input file, a transposition pitch in 12TET semitones, create an array to hold the output and pitch shift the signal. The delay size is set to be twice the fundamental period of the input signal (this is set by hand, here we are setting the fundamental frequency to 131 Hz, C2). This is to make sure the taps are space by one fundamental period. The output is made up of a mix of the original and the transposed signals:

(sr,signalin) = wavfile.read(sys.argv[2])
pitch = 2.**(float(sys.argv[1])/12.)
signalout = zeros(len(signalin))
fund = 131.
dsize = int(sr/(fund*0.5))
signalout = pitchshifter(signalin,signalout,pitch,dsize)
wavfile.write(sys.argv[3],sr,array((signalout+signalin)/2., dtype=’int16′))

Finally, the program could be completed by an automatic pitch tracking element, setting the delays to accommodate changes in the input. That would make a nice exercise for anyone studying this code.

Here is the full program:


from pylab import *
from scipy.io import wavfile

def pitchshifter(sigin,sigout,pitch,deltime):

size = deltime       # delay time in samples
delay = zeros(size) # delay line
env = bartlett(size)   # fade envelope table
tap1 = 0            # tap positions
tap2 = size/2
wp = 0              # write pos


for i in range(0, len(sigin)):

delay = sigin[i]   # fill the delay line
# first tap, linear interp readout
frac = tap1 - int(tap1)
if tap1 < size - 1 : delaynext = delay[tap1+1]
else: delaynext = delay[0]
sig1  =  delay[int(tap1)] + frac*(delaynext - delay[int(tap1)])
# second tap, linear interp readout
frac = tap2 - int(tap2)
if tap2 < size - 1 : delaynext = delay[tap2+1]
else: delaynext = delay[0]
sig2  =  delay[int(tap2)] + frac*(delaynext - delay[int(tap2)])
# fade envelope positions
ep1 = tap1 - wp
if ep1 < 0: ep1 += size
ep2 = tap2 - wp
if ep2 <  0: ep2 += size


# combine tap signals
sigout[i] = env[ep1]*sig1 + env[ep2]*sig2
# increment tap pos according to pitch transposition
tap1 += pitch
tap2 = tap1 + size/2


# keep tap pos within the delay memory bounds
while tap1 >= size: tap1 -= size
while tap1 < 0: tap1 += size


while tap2 >= size: tap2 -= size
while tap2 < 0: tap2 += size


# increment write pos
wp += 1
if wp == size: wp = 0

return sigout

(sr,signalin) = wavfile.read(sys.argv[2])
pitch = 2.**(float(sys.argv[1])/12.)
signalout = zeros(len(signalin))


fund = 131.
dsize = int(sr/(fund*0.5))
print dsize
signalout = pitchshifter(signalin,signalout,pitch,dsize)
wavfile.write(sys.argv[3],sr,array((signalout+signalin)/2., dtype='int16'))

A Phase Vocoder in Python

In this note, I will show how the Phase Vocoder algorithm can be realised in Python, with the help of its very useful scientific libs. This little PV program does timestretching of an input.

First we import the required packages: sys, scipy, pylab and scipy.io. I am being quite liberal here, not using namespaces, but good practice tells us we should not do this. However, this simplifies the reading of the code:

import sys
from scipy import *
from pylab import *
from scipy.io import wavfile

Then we set our analysis parameters DFT size (N) and hopsize (H)

N = 2048
H = N/4

Take in an input soundfile name and a timescale factor from the command line:

# read input and get the timescale factor
(sr,signalin) = wavfile.read(sys.argv[2])
L = len(signalin)
tscale = float(sys.argv[1])

Set up our signal arrays to hold the processing output

# signal blocks for processing and output
phi  = zeros(N)
out = zeros(N, dtype=complex)
sigout = zeros(L/tscale+N)

Find out what the peak amp of input (for scaling) and create a hanning window

# max input amp, window
amp = max(signalin)
win = hanning(N)

This is the processing loop. We’ll do the PV idea in a slightly different way from the example in the book. There, we created a spectral signal made up of amp,freq frames. Here we will not bother with this, we will just move along the input, calculating the PV parameters of two consecutive windows and then resynthesise these straight away. Timescale changes will happen if we move along the input at a different hopsize than H. The input will be overlap-added every H samples, which is also the hopsize basis of our PV analyses (the hop between the two consecutive analyses).

p = 0
pp = 0
while p < L-(N+H):

# take the spectra of two consecutive windows
p1 = int(p)
spec1 =  fft(win*signalin[p1:p1+N])
spec2 =  fft(win*signalin[p1+H:p1+N+H])

# take their phase difference and integrate
phi += (angle(spec2) – angle(spec1))

# bring the phase back to between pi and -pi
while phi < -pi: phi += 2*pi
while phi >= pi: phi -= 2*pi
out.real, out.imag = cos(phi), sin(phi)

# inverse FFT and overlap-add
sigout[pp:pp+N] += win*ifft(abs(spec2)*out)
pp += H
p += H*tscale

Then we just write the output and scale it to match the original amp.

  #  write file to output, scaling it to original amp

wavfile.write(sys.argv[3],sr,array(amp*sigout/max(sigout), dtype=’int16′))

we also attempt to play using sndfile-play if it is available:

  #  play it using a libsndfile utility
import os
try: os.spawnlp(os.P_WAIT, ‘sndfile-play’, ‘sndfile-play’, sys.argv[3])
except: pass

So, a slightly different way of doing things, demonstrating that there is always more than one way to skin a goat. Here is the full program (I hope the formatting does not get broken too much):

# phase vocoder example
# (c) V Lazzarini, 2010
# GNU Public License
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #


import sys
from scipy import *
from pylab import *
from scipy.io import wavfile


N = 2048
H = N/4


# read input and get the timescale factor
(sr,signalin) = wavfile.read(sys.argv[2])
L = len(signalin)
tscale = float(sys.argv[1])
# signal blocks for processing and output
phi  = zeros(N)
out = zeros(N, dtype=complex)
sigout = zeros(L/tscale+N)

# max input amp, window
amp = max(signalin)
win = hanning(N)
p = 0
pp = 0

while p < L-(N+H):

# take the spectra of two consecutive windows
p1 = int(p)
spec1 =  fft(win*signalin[p1:p1+N])
spec2 =  fft(win*signalin[p1+H:p1+N+H])
# take their phase difference and integrate
phi += (angle(spec2) - angle(spec1))
# bring the phase back to between pi and -pi
while phi < -pi: phi += 2*pi
while phi >= pi: phi -= 2*pi
out.real, out.imag = cos(phi), sin(phi)
# inverse FFT and overlap-add
sigout[pp:pp+N] += win*ifft(abs(spec2)*out)
pp += H
p += H*tscale


#  write file to output, scaling it to original amp

wavfile.write(sys.argv[3],sr,array(amp*sigout/max(sigout), dtype='int16'))


#  play it using a libsndfile utility

import os
try:

os.spawnlp(os.P_WAIT, 'sndfile-play', 'sndfile-play', sys.argv[3])

except:

pass