Skip to content

MohammadJRanjbar/Digital-Speech-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Speech Processing Digital Speech Processing

Welcome to my repository, which contains coursework for the Digital Speech Processing course offered at the University of Tehran. This repository includes code for assignments and projects completed throughout the course. The course was instructed by:

Course Description

The course covers a broad range of topics including:

  • Speech Production and Perception: Understanding speech signals, articulatory and acoustic phonetics, and the analysis of phonemes and syllables in both Persian and English.

  • Digital Signal Processing: Fundamentals of signal processing, including Fourier and Z transforms.

  • Statistics and Probability: Basic principles of probability theory, various distributions, and estimation techniques.

  • Speech Signal Representation: Techniques such as the source-filter model, Short-Time Fourier Transform (STFT), Linear Predictive Coding (LPC), cepstral analysis, and Mel-Frequency Cepstral Coefficients (MFCC).

  • Machine Learning and Deep Learning: Introduction to machine learning concepts, neural networks (including Perceptrons, Multi-Layer Perceptrons (MLP), Autoencoders, Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, Attention Mechanisms, Transformers, BERT, and GPT).

  • Speech Recognition: Challenges, evaluation methods, recognition techniques, including Dynamic Time Warping (DTW), Artificial Neural Networks (ANN), Hidden Markov Models (HMM), and deep learning approaches.

  • Speech Synthesis: Methods and issues in text-to-speech synthesis, including formant synthesis, concatenative synthesis, statistical parametric synthesis, and deep learning-based synthesis.

  • Speech Enhancement: Techniques for single-channel speech enhancement, noise recognition, evaluation metrics, spectral subtraction methods, and enhancement using HMM and deep learning.

Table of Contents

Please find below a brief overview of the contents of this repository:

  1. HW1/: This homework introduces the concept of Word Error Rate (WER) and includes the implementation of a concatenative text-to-speech model specifically designed for digit pronunciation.

  2. HW2/: This homework assignment focuses on several key aspects of signal processing fundamentals. It includes questions related to the Fourier Transform, the implementation of the Discrete-Time Fourier Transform (DTFT), and the development of a low-pass filter. Additionally, the assignment involves implementing a Persian audio chatbot that connects to ChatGPT.

  3. HW3/: In this homework, we explore various windowing techniques, including rectangular, Hann, cosine, and Hamming windows. Additionally, we focus on audio digit recognition using different feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Zero-Crossing Rate (ZCR).

  4. HW4/: This homework consists of several key tasks: implementing a Multi-Layer Perceptron (MLP) from scratch for audio digit recognition, developing a Convolutional Neural Network (CNN) for music classification, and training a logistic regression model for speaker recognition.

  5. HW5/: This homework involves calculating gradients by hand to deepen understanding of optimization techniques. It also includes fine-tuning the Whisper model for Persian language processing and implementing an LSTM (Long Short-Term Memory) model for music genre recognition.

  6. Project/: This assignment focuses on implementing a model for Persian speech emotion recognition. It includes training and evaluating multiple models, such as Huber and Wav2Vec, to enhance the performance of speech emotion recognition tasks.

Disclaimer

This repository is for archival and reference purposes only. The code here might not be updated or maintained. Use it at your own discretion.

About

Assignments and projects from the Software Testing course offered at the University of Tehran.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published