December | 2013 | Science Ninja Team

TL;DR at end of post if you want to skip my nonsense.

Here’s something that almost made me tear my laptop in two like a phone book.

I’ve moved to a lower-level sound model so I can have sample-accurate control. I’m still using two bar phrases which need to be connected together seamlessly; the overhead in the typical Sound.play() model was causing a several ms hiccup which was unnoticeable to the human ear but threw timing off disastrously. With the lower level model, the algorithm looks something like this:

function processSound(event:SampleDataEvent):void {
	var bytes:ByteArray = new ByteArray();
	var numSamples:int = 0;
	var startSample:int = -1;
	while (numSamples <= BUFFER_SIZE){
 		var samplesRequest:int = BUFFER_SIZE - numSamples;
 		numSamples += _currentTrack.extract(bytes, samplesRequest, startSample);
 		startSample = 0;
 	}
 	event.data.writeBytes(bytes);
 }

Note first of all that the actual .as file has an additional 42 lines of explanatory, questioning, soul-searching comments and debug statements, excised here for your benefit. The idea is simple: we fill a ByteArray with up to BUFFER_SIZE samples (8 bytes each: 2 x 32-bits representing left and right channel), then go back to the beginning if we need more, and so on.

Problem Number 1: First of all, let's say you read the last remaining 1987 samples in the Sound. You loop, and ask for an additional 6205 samples. Flash gives you 6204 for some reason. So you loop again and ask for 1 measly sample. Yet stingy Flash gives you 0. And you loop again and again and again and again okay we get it, that's great, we get it. Why?

After some debugging I found that Sound.extract() would only return a number of samples that was a multiple of four. So 8192, 8188, 6204, 4, etc. Okay, the fix for that is simple, if stupid: just buffer a little less than BUFFER_SIZE if you don't get within 4 samples of it. That means your buffer isn't quite up to size, but that shouldn't really affect anything, especially since we're working with a large buffer. So you can do

while (numSamples < (BUFFER_SIZE - MINIMUM_SAMPLE_REQUEST)){...}

and you should be fine.

But you're not fine, you're still screwed, because of:

Problem Number 2: when you add your 6204 bytes to your 1987 bytes and write that ByteArray to the sound stream, you get "RangeError: Error #2004: One of the parameters is invalid", which error message, of course, is enthroned in the great pantheon of Really Helpful Error Messages Thank You Very Much. Like, not even maybe a stack trace? Or, like, maybe mention who didn't like the parameters? Or, get this, which parameters? If I had a little more time I would point out that there are thousands of parameters being passed around the program, only some of which I'm privy to, and in fact I might, if I had the time, calculate the efficiency at which one could approach figuring out which parameter is invalid and that it is, in fact, rather poor, and also could then extrapolate that to estimate the time spent by your average programmer trying to figure it out and show with some rigor that that time is measured in hours, not minutes, and in fact may bleed into the lives of other good-hearted programmers on various StackExchange-like message boards who unwittingly may try to help, and could furthermore argue, if one thinks about it, that all of these man-hours have a cost, financial and otherwise, inasmuch as some involved may be gainfully employed but more importantly time spent on this earth is sadly finite and has an innate value of its own, and that all lives born up in this travesty of an error message are irrevocably shortened and made worse by it, and I might conclude that the true cost of this error message is not measured in money, nor man-hours, nor inefficiency, but rather human tears and despair, but no, there is unfortunately no time for that analysis to be done, I'm afraid.

Instead I determined the error was coming from event.writeBytes(), which was receiving way way too many bytes of samples. Like, four times as many. Exactly four times. But it's stranger. If you read BUFFER_SIZE samples correctly, the ByteArray ends up being correctly sized (BUFFER_SIZE * 8 bytes). But if you reach the end, and Sound.extract() reports it wrote 1987 samples, if you check the ByteArray it really received 63584 bytes, which, if we were in sane-happy-land, would be 7948 samples. But we're not in that fabled land where things work like you expect them to and it's safe to make certain assumptions, so we don't know which it is: did we actually get more samples, or did all of the samples take up 32-bytes (!!) each, or is it a misreporting, or is this all just a dream and when I wake up I'll be in sane-happy-land, or is the land you wake up in even worse? The possibilities are too fearsome to contemplate, and getting to the bottom of it - I supposed by looking through the byte array, or analyzing the sound produced - are beyond the scope of my life.

Things were looking grim for our hero. On a whim I thought, what if there's something wrong with the file? I produced it with professional hardware and software but maybe there's some kind of error that's killing Flash and me slow. But before checking the source I went into Flash Professional and messed around with the encoding of the file, changing it first from "Default Encoding" to RAW, stereo, no compression. And it worked, flawlessly. It even extracted odd, non-multiples of four numbers of samples. So I tried choosing another encoding: MP3 160kb/s 44.1 stereo, and that worked as well. Then I started writing this blog post, later still you started reading it, and here we are.

I don't know exactly why the encoding might cause such a strange thing: generally correct extraction until the end of the file when you get QUAD DAMAGE or something. And that's one of those things I'm just going to file under things-i-don't-even-want-to-know-the-answer-to-i-just-want-them-to-go-away and hopefully there it shall remain evermore.

TL;DR: Sound.extract() may report an incorrect number of samples read as its return value if you reach the end of a sound file that is encoded in a certain way for yet undiscovered reasons. Solution: re-encode the file.

What genius he employs in selecting titles for posts! Bravo!

Notes on syncing animations and sound in Flash.

Pause/Resume: There’s no way to pause a playing sound, only stop it. So to pause you have to store the time at which you stop it then when you want to unpause you start the sound playing at the given time. Great (?), but apparently Flash isn’t real concerned about coming in on time, so it may start a few ms early. Which wouldn’t be a problem unless you were counting on your next update happening after the previous update, in which case you’re plum out of luck. In my case I’m doing a calculation to see how time has passed in a looping sound. If the current playhead position is less than the last checked position then I assume the sound has looped and calculate accordingly. If, say, the current position = last position – 1, then almost an entire loop has gone by, which information is then used to update animations, which as you may imagine makes things really fucking wrong. The solution I used was to wait to do any updates until the stored pause position is passed, then reset the pause position to 0. But I’m not real happy with that and I’m hoping as I learn more a better way will be revealed.

II.

Here’s an attempt at a summary of what I’ve learned from a magnificent post at reddit (who knew it was possible?) about syncing:

It is possible to make a good rhythm game with Flash. I was already coming to this conclusion, but it’s good to see another project (TREBL: Rhythm Arcade) that uses it well.
We need a way to do audio and video calibration. Audio is more important, since that is (hopefully) what the user will use to aim for notes. In a video showing calibration from Rock Band creators Harmonix, the audio adjustment is under 10ms. Video adjustment is < 100ms (!) on a large TV. I don’t know yet whether it’s fair to generalize those figures or not.
Audio position isn’t guaranteed to update regularly. It will more likely update in a step-wise fashion, ie. it may produce the same number multiple times in a row then jump to a larger number but on average will be correct. This corresponds roughly to what I’m seeing in our initial demos, where notes jitter a little bit. I will note that our jitter is very minimal, though, and is not noticeable at smaller sizes. Of course that is n=1, though, and the results are likely to be highly variable dependent on the audio hardware and drivers.
Here’s his block of code for determining song time:
```
songStarted() {
    previousFrameTime = getTimer();
    lastReportedPlayheadPosition = 0;
    mySong.play();
}

everyFrame() {
    songTime += getTimer() - previousFrameTime;
    previousFrameTime = getTimer();
    if(mySong.position != lastReportedPlayheadPosition) {
        songTime = (songTime + mySong.position)/2;
        lastReportedPlayheadPosition = mySong.position;
    }
}
```
Analysis: when we update at 60 times/second, check to see if the song position has been updated. If it has, average it with the internal song position that we’re keeping. The internal song position is updated based on the more accurate getTimer(). But of course it’s likely to drift away from the recording, so averaging them together keeps them in check. Question: is it still possible to drift away significantly? Let’s say mySong.position updates only every 100ms or so. And maybe our internal counter has a constant drift of 10ms per 100ms. The first update the song is at 100 but we’re at 110, so we compromise at 105. The next update the song is at 200, the internal is at 215, so we get 207. Then 300, 317 -> 308. 400, 418 -> 409. Seems to me it will just keep increasing, if that drift is a constant, which is kind of a big question. So that bears looking into.
Delays to be concerned with: audio processing -> speakers -> ears (sound in air is like 1ms/foot). Key press -> listeners in program. Graphics -> screen. As discussed before, graphics are probably the largest. Audio less so but most important. Key press registration delays are probably minimal if you use listeners. Hard to separate it from audio/visual delays in a calibrator. And if you use two separate calibrators then you may be including key delays twice. Accounting for it may take fiddling (read: uninformed hacking about with the numbers).
Question: for a calibration tool, could you have a simultaneously playing audio click and visual flash and the user adjusts a slider to line them up? Then we have an idea of when they perceive things to be simultaneous, or the relative delay between sight and sound. Note it does not tell you when the user perceives it, so it’s not a complete test.
In the calibration process we may also have to deal with user interpretation, ie. they may play a little behind the beat or anticipate it even when focusing intently on a simplified calibration task. In the Rock Band vid they did automatic calibration with a tool built into the guitar which is neat but we don’t have that luxury (and plus, WHO CALIBRATES THE CALIBRATORS?)
This isn’t in the article, but I’m going to assert that calibration tests should be done at the indifference point, around 96bpm, a tempo where people tend to not anticipate too much and not drag too much (according to Vierordt’s Law, which I don’t really know if it’s true or not).

Science Ninja Team

Developing awesome things

Monthly Archives: December 2013

More Audio Woes

The Kitchen Sync