audio

class janim.items.audio.Audio(file_path: str = '', begin: float = -1, end: float = -1, **kwargs)

Bases: object

Audio

Can configure audio_channels option to control the number of channels to read (default is 2)

See also: Config

audio_cache_map: dict[tuple, tuple[ndarray, int, str, str]] = {}
copy() Self
set_samples(data: ArrayLike) None
read(file_path: str, begin: float = -1, end: float = -1) Self

Read audio from file

Can specify begin and end to extract a portion of the audio

sample_count() int

Number of all sample points

duration() float

Duration

clip(begin: float = 0, end: float = -1) Self

Clip audio

  • Keep the portion between begin and end

  • If begin is omitted, it means from the beginning

  • If end is omitted (-1), it means to the end

mul(value: float | Iterable[float]) Self

Multiply by the given value, value can contain multiple elements (e.g., a list)

For example:

  • audio.mul(0.5) can halve the pitch

  • audio.mul([1, 0]) can make it strongest at the start and weakest at the end

  • audio.mul(np.sin(np.linspace(0, 2 * np.pi, audio.sample_count()))) can multiply the pitch by one cycle of the sin function over time

fade_in(duration: float) Self

Apply duration seconds of fade-in

fade_out(duration: float) Self

Apply duration seconds of fade-out

recommended_ranges(*, amplitude_threshold_ratio: float = 0.02, gap_duration: float = 0.15) Generator[tuple[float, float], None, None]

Get several usable ranges (start, end), generally used for voice-over audio, i.e., ignores silent parts and gets the start and end times of segments with sound

The difference from recommended_range() is that this method returns several segments. For example, if there’s a pause after speaking a sentence and then speaking again, it will be divided into two segments

  • amplitude_threshould_ratio: Amplitudes below this ratio are considered silent

  • gap_duration: If the duration of silence is greater than this time, the segments before and after will be separated

recommended_range(*, amplitude_threshold_ratio: float = 0.02) tuple[float, float] | None

Get a usable range (start, end), generally used for voice-over audio, i.e., ignores silent parts and gets the start and end times of segments with sound

The difference from recommended_ranges() is that this method returns the entire segment from the beginning to the end

  • amplitude_threshould_ratio: Amplitudes below this ratio are considered silent