From fungsin.lui at gmail.com  Mon Oct  9 07:21:46 2006
From: fungsin.lui at gmail.com (Lui Fungsin)
Date: Mon, 9 Oct 2006 00:21:46 -0700
Subject: [cl-wav-synth-devel] newbie question
Message-ID: <3990b5930610090021pdaf149fx63f426dbd6e03499@mail.gmail.com>

Hi,

I just finished watching cl-wav-synth demo tutorial, it's way cool!

I see that this is a new project and not much traffic here, so I hope
that you guys wouldn't mind a dumb question.

I'm clueless with audio and wav file format, etc.
However, there's a simple task that I want to try my hands on with the
cl-wav-synth library.

Here're two sound files for some chinese words. Some word has more
than one pronounciation (like the first file below) while most of the
others only have one.

http://209.172.124.170/pub/two_tone.wav
http://209.172.124.170/pub/single_tone.wav

Is it possible to programmically detect if there's a voice uttered at
the beginning of a wav file, then some short period of silence, and
then another voice uttered.

If this is the case, I want to split that into two files (break at the
silence). Otherwise I can just leave it alone.

If this can be done I'd greatly appreciate if someone can briefly
describe the procedure, or can point me to a right direction (url to
read, etc).

Many thanks.
fungsin


From hocwp at free.fr  Mon Oct  9 19:18:44 2006
From: hocwp at free.fr (Philippe Brochard)
Date: Mon, 09 Oct 2006 21:18:44 +0200
Subject: [cl-wav-synth-devel] newbie question
In-Reply-To: <3990b5930610090021pdaf149fx63f426dbd6e03499@mail.gmail.com> (Lui
	Fungsin's message of "Mon, 9 Oct 2006 00:21:46 -0700")
References: <3990b5930610090021pdaf149fx63f426dbd6e03499@mail.gmail.com>
Message-ID: <871wphjo17.fsf@grigri.elcforest>

Lui Fungsin writes:

> Hi,
>
Hi, thanks a lot for your interest in cl-wav-synth!

> I just finished watching cl-wav-synth demo tutorial, it's way cool!
>
thanks :)

> I see that this is a new project and not much traffic here, so I hope
> that you guys wouldn't mind a dumb question.
>
> I'm clueless with audio and wav file format, etc.
> However, there's a simple task that I want to try my hands on with the
> cl-wav-synth library.
>
> Here're two sound files for some chinese words. Some word has more
> than one pronounciation (like the first file below) while most of the
> others only have one.
>
> http://209.172.124.170/pub/two_tone.wav
> http://209.172.124.170/pub/single_tone.wav
>
> Is it possible to programmically detect if there's a voice uttered at
> the beginning of a wav file, then some short period of silence, and
> then another voice uttered.
>
> If this is the case, I want to split that into two files (break at the
> silence). Otherwise I can just leave it alone.
>
> If this can be done I'd greatly appreciate if someone can briefly
> describe the procedure, or can point me to a right direction (url to
> read, etc).
>
Here is how I write this (load it from slime or the clim repl):

--------------------------------------------------
(in-package :wav)

(defun find-peak (sample &optional (max-level 5000) (min-level 100) (min-index 1000))
  "Find the number of peak in a sample. Return the tone count and
  there index in a list as two values"
  (with-slots (data) sample
    (let ((count 0)
	  (find-max nil)
	  (find-min 0)
	  (acc nil))
      (loop for sample across data
	    for index from 0 do
	    (cond ((> (abs sample) max-level) (setf find-max t
						    find-min 0))
		  ((< (abs sample) min-level)
		   (incf find-min)
		   (when (and find-max (> find-min min-index))
		     (incf count)
		     (setf find-max nil)
		     (push index acc)))
		  (t (setf find-min 0))))
      (values count (nreverse acc)))))
--------------------------------------------------


Then in the clim REPL:

WAV> Load As Sample (pathname) single_tone.wav
WAV> (with-sample (find-peak it))
0 1
1 (17525)

WAV> Load As Sample (pathname) two_tone.wav
WAV> (with-sample (find-peak it))
0 2
1 (23303 60504)

WAV> (set-sample (mix it (delay it 4)))
WAV> (with-sample (find-peak it))
0 4
1 (23303 60504 111503 148704)


The first value is the number of tone in the file.
The second value is a list of each tone index.

Then you can do what you want with this value.

For example to isolate the first tone:

WAV> (set-sample (cut-i it 0 23303))
WAV> (with-sample (write-sample "first-tone.wav" it))


To isolate the second tone:

WAV> (set-sample (cut-i it 23303 60504))

Etc...

And if you want to automate this and save a file per tone:

--------------------------------------------------
(with-sample
  (multiple-value-bind (total-count index)
      (find-peak it)
    (loop for i in index
	  for s = 0 then e
	  for e = i
	  for count from 0
	  do (write-sample (format nil "tone-~A.wav" count)
			   (cut-i it s e)))))
--------------------------------------------------


Note: a sample is just a wav header (bit per sample...) and a big
array of data.

You can adjust levels:
  - Max and min level are detection levels.
  - Min index is the minimal length of the silence in sample index.


> Many thanks.
>
I hope that helps.

> fungsin
>
Philippe

-- 
Philippe Brochard    <hocwp at free.fr>
                      http://hocwp.free.fr

-=-= http://www.gnu.org/home.fr.html =-=-


From fungsin.lui at gmail.com  Mon Oct 23 03:55:57 2006
From: fungsin.lui at gmail.com (Lui Fungsin)
Date: Sun, 22 Oct 2006 20:55:57 -0700
Subject: [cl-wav-synth-devel] newbie question
In-Reply-To: <871wphjo17.fsf@grigri.elcforest>
References: <3990b5930610090021pdaf149fx63f426dbd6e03499@mail.gmail.com>
	<871wphjo17.fsf@grigri.elcforest>
Message-ID: <3990b5930610222055y557b08a8k9c9f22ff4bab4a6b@mail.gmail.com>

On 10/9/06, Philippe Brochard <hocwp at free.fr> wrote:
> Here is how I write this (load it from slime or the clim repl):
>

Hi Philippe,

This works well for me. Thanks!

BTW, during the course of parsing the pronounciation files I have, I
enhance the wav header parsing method a bit to skip other misc header
fields.

With this patch I'm able to read all of the 10000+ wav samples I have.

Attached is the diff.

-- fungsin
-------------- next part --------------
Index: cl-wav-synth.lisp
===================================================================
--- cl-wav-synth.lisp	(revision 910)
+++ cl-wav-synth.lisp	(working copy)
@@ -353,9 +353,10 @@
 
 (defgeneric read-header (filename header))
 (defmethod read-header (filename (header header))
+  "Read wav header info. See http://www.sonicspot.com/guide/wavefiles.html"
   (labels ((expected (read-str orig-str)
 	     (assert (string= read-str orig-str) ()
-		     "error reading header: ~S is not a wav file" filename)))
+		     "error reading header: ~S is not a wav file. Expected ~A Got ~A" filename orig-str read-str)))
     (with-slots (n-samples-per-sec
 		 n-channels n-bits-per-sample
 		 n-block-align n-avg-bytes-per-sec
@@ -365,16 +366,25 @@
 	(expected (read-id stream 4) "RIFF")
 	(read-32 stream)
 	(expected (read-id stream  4) "WAVE")
-	(expected (read-id stream 4) "fmt ")
-	(read-32 stream)
-	(read-16 stream)
-	(setf n-channels (read-16 stream))
-	(setf n-samples-per-sec (read-32 stream))
-	(setf n-avg-bytes-per-sec (read-32 stream))
-	(setf n-block-align (read-16 stream))
-	(setf n-bits-per-sample (read-16 stream))
-	(expected (read-id stream 4) "data")
-	(setf total-byte (read-32 stream)))))
+        (loop
+         (let* ((next-header (read-id stream 4))
+                (bytes (read-32 stream)))
+           (cond ((string= next-header "fmt ")
+                  (read-16 stream) ;; compression code
+                  (setf n-channels (read-16 stream)) 
+                  (setf n-samples-per-sec (read-32 stream))
+                  (setf n-avg-bytes-per-sec (read-32 stream))
+                  (setf n-block-align (read-16 stream))
+                  (setf n-bits-per-sample (read-16 stream))
+                  ;; possible extra format bytes
+                  (dotimes (i (- bytes 16)) (read-byte stream)))
+                 ((string= next-header "data")
+                  (setf total-byte bytes)
+                  (return))
+                 (t
+                  ;; There're a lot of headers that we don't
+                  ;; care. For instance, bext minf elmo, etc
+                  (dotimes (i bytes) (read-byte stream)))))))))
   header)
 
 (defgeneric print-header (header &optional comment))

From hocwp at free.fr  Thu Oct 26 15:24:25 2006
From: hocwp at free.fr (Philippe Brochard)
Date: Thu, 26 Oct 2006 17:24:25 +0200
Subject: [cl-wav-synth-devel] newbie question
In-Reply-To: <3990b5930610222055y557b08a8k9c9f22ff4bab4a6b@mail.gmail.com>
	(Lui Fungsin's message of "Sun, 22 Oct 2006 20:55:57 -0700")
References: <3990b5930610090021pdaf149fx63f426dbd6e03499@mail.gmail.com>
	<871wphjo17.fsf@grigri.elcforest>
	<3990b5930610222055y557b08a8k9c9f22ff4bab4a6b@mail.gmail.com>
Message-ID: <87d58fyuae.fsf@grigri.elcforest>


Lui Fungsin writes:

> On 10/9/06, Philippe Brochard <hocwp at free.fr> wrote:
>> Here is how I write this (load it from slime or the clim repl):
>>
>
> Hi Philippe,
>
> This works well for me. Thanks!
>
Ok, cool :)

> BTW, during the course of parsing the pronounciation files I have, I
> enhance the wav header parsing method a bit to skip other misc header
> fields.
>
> With this patch I'm able to read all of the 10000+ wav samples I have.
>
> Attached is the diff.
>
Thanks a lot, this is in the cvs and in the current release.

Philippe

-- 
Philippe Brochard    <hocwp at free.fr>
                      http://hocwp.free.fr

-=-= http://www.gnu.org/home.fr.html =-=-