Digital Speech Processing

Welcome to my repository, which contains coursework for the Digital Speech Processing course offered at the University of Tehran. This repository includes code for assignments and projects completed throughout the course. The course was instructed by:

Dr. Hadi Veisi

Course Description

The course covers a broad range of topics including:

Speech Production and Perception: Understanding speech signals, articulatory and acoustic phonetics, and the analysis of phonemes and syllables in both Persian and English.
Digital Signal Processing: Fundamentals of signal processing, including Fourier and Z transforms.
Statistics and Probability: Basic principles of probability theory, various distributions, and estimation techniques.
Speech Signal Representation: Techniques such as the source-filter model, Short-Time Fourier Transform (STFT), Linear Predictive Coding (LPC), cepstral analysis, and Mel-Frequency Cepstral Coefficients (MFCC).
Machine Learning and Deep Learning: Introduction to machine learning concepts, neural networks (including Perceptrons, Multi-Layer Perceptrons (MLP), Autoencoders, Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, Attention Mechanisms, Transformers, BERT, and GPT).
Speech Recognition: Challenges, evaluation methods, recognition techniques, including Dynamic Time Warping (DTW), Artificial Neural Networks (ANN), Hidden Markov Models (HMM), and deep learning approaches.
Speech Synthesis: Methods and issues in text-to-speech synthesis, including formant synthesis, concatenative synthesis, statistical parametric synthesis, and deep learning-based synthesis.
Speech Enhancement: Techniques for single-channel speech enhancement, noise recognition, evaluation metrics, spectral subtraction methods, and enhancement using HMM and deep learning.

HW1/: This homework introduces the concept of Word Error Rate (WER) and includes the implementation of a concatenative text-to-speech model specifically designed for digit pronunciation.
HW2/: This homework assignment focuses on several key aspects of signal processing fundamentals. It includes questions related to the Fourier Transform, the implementation of the Discrete-Time Fourier Transform (DTFT), and the development of a low-pass filter. Additionally, the assignment involves implementing a Persian audio chatbot that connects to ChatGPT.
HW3/: In this homework, we explore various windowing techniques, including rectangular, Hann, cosine, and Hamming windows. Additionally, we focus on audio digit recognition using different feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Zero-Crossing Rate (ZCR).
HW4/: This homework consists of several key tasks: implementing a Multi-Layer Perceptron (MLP) from scratch for audio digit recognition, developing a Convolutional Neural Network (CNN) for music classification, and training a logistic regression model for speaker recognition.
HW5/: This homework involves calculating gradients by hand to deepen understanding of optimization techniques. It also includes fine-tuning the Whisper model for Persian language processing and implementing an LSTM (Long Short-Term Memory) model for music genre recognition.
Project/: This assignment focuses on implementing a model for Persian speech emotion recognition. It includes training and evaluating multiple models, such as Huber and Wav2Vec, to enhance the performance of speech emotion recognition tasks.

Disclaimer

This repository is for archival and reference purposes only. The code here might not be updated or maintained. Use it at your own discretion.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
HW1		HW1
HW2		HW2
HW3		HW3
HW4		HW4
HW5		HW5
Project		Project
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
University_of_Tehran_logo.svg.png		University_of_Tehran_logo.svg.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Digital Speech Processing

Course Description

Table of Contents

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

MohammadJRanjbar/Digital-Speech-Processing

Folders and files

Latest commit

History

Repository files navigation

Digital Speech Processing

Course Description

Table of Contents

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages