audio¶

class janim.items.audio.Audio(file_path: str = '', begin: float = -1, end: float = -1, **kwargs)¶

Bases: object

Audio

Can configure audio_channels option to control the number of channels to read (default is 2)

See also: Config

audio_cache_map: dict[tuple, tuple[ndarray, int, str, str]] = {}¶

copy() → Self¶

set_samples(data: ArrayLike) → None¶

read(file_path: str, begin: float = -1, end: float = -1) → Self¶

Read audio from file

Can specify begin and end to extract a portion of the audio

sample_count() → int¶: Number of all sample points

duration() → float¶: Duration

clip(begin: float = 0, end: float = -1) → Self¶

Clip audio

Keep the portion between begin and end
If begin is omitted, it means from the beginning
If end is omitted (-1), it means to the end

mul(value: float | Iterable[float]) → Self¶

Multiply by the given value, value can contain multiple elements (e.g., a list)

For example:

audio.mul(0.5) can halve the pitch
audio.mul([1, 0]) can make it strongest at the start and weakest at the end
audio.mul(np.sin(np.linspace(0, 2 * np.pi, audio.sample_count()))) can multiply the pitch by one cycle of the sin function over time

fade_in(duration: float) → Self¶: Apply duration seconds of fade-in

fade_out(duration: float) → Self¶: Apply duration seconds of fade-out

recommended_ranges(*, amplitude_threshold_ratio: float = 0.02, gap_duration: float = 0.15) → Generator[tuple[float, float], None, None]¶

Get several usable ranges (start, end), generally used for voice-over audio, i.e., ignores silent parts and gets the start and end times of segments with sound

The difference from recommended_range() is that this method returns several segments. For example, if there’s a pause after speaking a sentence and then speaking again, it will be divided into two segments

amplitude_threshould_ratio: Amplitudes below this ratio are considered silent
gap_duration: If the duration of silence is greater than this time, the segments before and after will be separated

recommended_range(*, amplitude_threshold_ratio: float = 0.02) → tuple[float, float] | None¶

Get a usable range (start, end), generally used for voice-over audio, i.e., ignores silent parts and gets the start and end times of segments with sound

The difference from recommended_ranges() is that this method returns the entire segment from the beginning to the end

amplitude_threshould_ratio: Amplitudes below this ratio are considered silent