Software Engineer - REVE Systems
February 2023 - Present
Dhaka, Bangladesh
- Deployed a gender and age predictive model from speech, achieving 93% and 74% accuracy respectively.
- Optimized a custom-trained VITS text-to-speech model for ONNX on Bengali datasets, doubling inference speed for rapid cross-platform deployment.
- Developed a Your TTS-based multi-speaker model, enabling diverse voice synthesis with a single model.
- Enhanced ASR by training NVIDIA's Nemo Quartznet on Bengali datasets with beam search, improving prediction accuracy.
- Built a Speaker Diarization system using Nvidia TitaNet-L, identifying speakers in multi-party conversations with over 80% accuracy.
- Implemented a 3x faster FastSpeech 2 pipeline integrating TorchServe, WebSocket, and Redis for efficient batch processing.
- Conducted unsupervised clustering of noises using K-means, t-SNE, and silhouette scoring for future noise augmentation strategies.
- Developed a tri-layered Biometric Authentication System with speaker, facial, and fingerprint verification, supported by ROC & AUC analysis for threshold setting.
- Created a NISQA-based speech quality evaluation tool for quantitative assessment of speech signal integrity and clarity.
- Developed a Keycloak SPI for integrating face and fingerprint authentication, enhancing security with biometric verification.
Technologies: Python, FastAPI, Scikitlearn, Tensorflow, Pytorch, Tensorboard, MLflow, Redis, Torchserve, Websocket, MongoDB, Linux, Lambdalabs, Java, Prompt Engineering
Junior Software Engineer - REVE Systems
September 2022 - January 2023
Dhaka, Bangladesh
- Implemented audio analysis techniques for pitch detection, volume normalization, and energy calculation.
- Doubled voice activity detection accuracy across use cases with sileroVAD from WebRTCVAD.
- Enhanced audio quality using DeepfilterNet and Meta speech enhancement pretrained models for noise reduction.
- Optimized audio data processing speed by a factor of 5 using Python's multiprocessing.
- Configured Azure and Google Cloud STT/TTS APIs for comprehensive speech-to-text and text-to-speech evaluations.
- Applied multispeaker speech separation technology using Conv-TasNet and DPRNN pretrained models.
Technologies: Python, FastAPI, Matplotlib, Seaborn, Numpy, Pandas, GCP
Trainee Software Engineer - REVE Systems
April 2022 - August 2022
Dhaka, Bangladesh
- Engineered a Bengali Text Normalizer and trained a FastSpeech 2 TTS model to support Bengali language processing.
- Gained proficiency in containerization with Docker and API development with Flask.
- Performed audio data analysis and signal processing, focusing on data visualization for insights.
Technologies: JavaScript, Python, Docker, Flask, Postman, GitHub
Intern - SELISE rockin' software
May 2019 - June 2019
Dhaka, Bangladesh
- Built an online quiz platform; front-end in HTML, CSS, JavaScript, Bootstrap; back-end in PHP, Ajax, MySQL.
- Learned GIT, SCRUM, SDLC; facilitated admin controls for question curation.
- Implemented registration for timed student tests and performance comparison.
Technologies: HTML, CSS, JavaScript, Bootstrap, PHP, Ajax, MySQL