TechTrail

Turn Your Portfolio Into an AI Chatbot

Tuhin Kumar Dutta — Sun, 04 Jan 2026 06:03:41 GMT

People build resumes. Some build portfolio websites. A few create Notion pages.
Most of it gets skimmed and forgotten.

A conversational portfolio is different.
It talks. It adapts. It answers exactly what someone cares about.
And you don’t need to manage servers, rent GPUs, or write complex infra scripts to do it.

In this guide, I’ll show you exactly how to duplicate my Virtual Portfolio Chatbot hosted on Hugging Face Spaces and turn it into your own digital twin.

What You’ll Get

By the end, you will have:

A ready-to-use AI chatbot hosted on Hugging Face
Backed by Groq API
Customized with your achievements, knowledge, tone, personality
Zero infrastructure headaches
No backend setup required

My live example:
https://huggingface.co/spaces/tkdutta/virtual_tkd

Repo (if you want to explore or understand structure):
https://huggingface.co/spaces/tkdutta/virtual_tkd/tree/main

Why I Built This

Three simple reasons:

Static portfolios feel lifeless
I wanted something interactive, memorable and dynamic
Integrating it into my portfolio app.

This chatbot:

represents me
answers questions about me
adapts as I grow
and honestly, feels more “alive” than a PDF

Prerequisites

You only need:

Hugging Face account
Groq API key (Get your oown from https://console.groq.com/keys)
Two Google Drive .txt files

That’s it.

🛠 Step 1: Duplicate the Space

Go to:
https://huggingface.co/spaces/tkdutta/virtual_tkd

Click Duplicate Space.

During duplication, Hugging Face will already give you fields to:

Add Secrets
Add Variables

You can set them right there itself or do it later. Either is fine.

Keep:

Default CPU hardware (good enough)
Public or Private as per preference

Done. You now have your own copy.

🔐 Step 2: Add Space Secrets

Go to:
Settings → Secrets

Add these.

1️⃣ GROQ_API_KEY

Your Groq API key.
Self explanatory.

2️⃣ KNOWLEDGE_CONTEXT_URL

A public Google Drive link to a .txt file containing your knowledge.

What should be inside?
You decide:

Work history
Projects
Achievements
Experience stories
Tech stack
Anything you want your chatbot to “know”

This basically acts as your knowledge base.

Make sure:

File is .txt
Publicly shareable
Viewable by anyone

3️⃣ INSTRUCTIONS_URL

Another Google Drive .txt file.

This shapes the chatbot’s:

tone
behavior
personality
conversational style
boundaries
how it introduces you
how it replies to recruiters or visitors

Think of this as your “system prompt.”
It defines your chatbot’s personality.

⚙️ Step 3: Add Space Variables

Go to:
Settings → Variables

These are simple configurable values.

CURRENT_STATUS_MESSAGE

Your current status or what you’re focusing on.

Example:

Learning Terraform and preparing for certification.

This keeps changing over time, so keeping it separate avoids touching code repeatedly.

LAST_KNOWLEDGE_UPDATED_DATE

To show transparency about freshness of your knowledge base.

Example:

Dec 2025

Displayed in the chatbot UI footer so users know how updated the information is.

CREATOR_NAME

Name of the creator or portfolio owner.
This is displayed in the chatbot UI footer.

Example:

Tuhin Kumar Dutta

WEBSITE

Optional field.
If provided, this URL will be linked with the creator name in the footer.

Example:

https://www.tuhindutta.com/

If you don’t want to link anything, simply leave it empty.

🔄 Updating and Rebuilding

Whenever:

your Google Drive knowledge file changes
or your instruction file changes

Go to Hugging Face and simply Rebuild the Space.

That reloads fresh content.

🧪 Step 4: Test It

Open your chatbot.

Ask:

“Tell me about your experience”
“What projects have you built?”
“What are you currently working on?”
“Explain your expertise”

If it sounds like you, great.
If not, tune your .txt files and rebuild.

📌 Important Behavior Notes

1️⃣ Groq API Rate Limits

Your chatbot follows:

Groq API rate limits
Model usage caps

If you want higher usage or smoother performance, upgrade your Groq plan.

2️⃣ Hugging Face Free Tier Sleep Policy

If you keep HF Space on free CPU tier:

It goes to sleep after 48 hours of inactivity
To keep it alive, open it at least once every 48 hours
Or take HF paid subscription to keep it always active

So decide based on how actively you want it running.

3️⃣ No Conversation Memory

The chatbot does not track past chats.

Each message is handled as a fresh independent query.

Honestly, for a portfolio bot, this is good enough and keeps things simple.

4️⃣ Why Use Hugging Face Instead of Hosting Yourself?

Because:

Zero infra maintenance
Built-in logs
Built-in hosting
No deployments hassles
Easy duplication
Simple updates

This entire HF Space approach exists to make life easier.

If someone wants, they can still:

use the repo code
build their own app
deploy it wherever they want

But most people simply want it to work without pain. HF solves that.

5️⃣ You Can Use Your Own Domain

Yes, you can point your custom domain and redirect it to your HF space.

So your chatbot can live behind a cleaner URL.

🎯 Final Thoughts

This chatbot is meant to be:

simple
personal
useful
easy to maintain

No over-engineering.
No unnecessary complexity.

Just a neat digital twin that represents you well.

If you build yours, I’d genuinely love to see it.

Concurrency in Python

Tuhin Kumar Dutta — Fri, 26 Dec 2025 17:46:36 GMT

Python’s concurrency tools let you optimize programs by running tasks in overlapping periods—great for cutting down wait times in I/O-heavy work and taking advantage of multiple CPU cores for compute-intensive tasks. In this article, I’m focusing on the most practical and widely used approaches: multithreading for I/O-bound workloads, multiprocessing for CPU-bound tasks, and asyncio for high-scale, single-threaded efficiency.

Concurrency in Python is a deep topic, and no single article can cover every nuance. There are advanced patterns, tricky edge cases, and evolving best practices you’ll discover over time. Treat this as a solid starting point. Then experiment, break things, debug them, and refine your understanding—hands-on work is what really makes these concepts click.

And please, read this sequentially from start to finish… not concurrently. Otherwise, you may end up with a race condition in your understanding. 😄

Let’s dive in!

What is concurreny?

When tasks defined in the code runs in a symultaneous or sometimes seemingly almost (pseudo) symultaneous way that results in efficient usage of time and compute resources is defined as tasks running concurrently.

Why is it required?

I/O bound task

Simply put, by default Python code runs sequentially. Whenever a blocking task is encountered, the execution waits until the current task completes before moving forward. In some scenarios, this wait period is long enough that system resources remain largely idle. We can take advantage of this window to execute other tasks and return once the original task is ready to continue.
CPU bound task

There are cases where multiple tasks require heavy CPU processing. With sequential execution, each task must finish before the next one starts. Using multiprocessing, these tasks can be distributed across different CPU cores and executed in parallel, reducing overall execution time.

Important Terminologies

Thread

It is a unit of execution within a process that runs a piece of code. By default, since we have a single thread, all the tasks are handled by the same thread sequentially.

For e.g., we have a 3 tasks:

Load customers dataset.
Load accounts dataset.
Load orders dataset.

Multi-threading allows us to create multiple threads to perform the tasks concurrently.

Note: This is not parallel computing or multiprocess. Exact difference will be discussed later in Threading section.

Multiprocess

Multiprocessing is the ability to execute tasks using multiple processes. Each process has its own memory space and can run on a separate CPU core, enabling true parallel execution.

CPython

a bytecode interpreter
written in C
managing millions of tiny objects
optimized for single-thread performance
The core technical problem CPython faces:
1. Every Python object has reference counts, mutable internal state, shared memory.
2. So CPython must ensure:
  - reference counts stay correct
  - objects aren’t freed while still in use
  - memory isn’t corrupted

GIL

The Global Interpreter Lock (GIL) is a mechanism in CPython that allows only one thread to execute Python bytecode at a time, which limits true parallelism in multithreaded, CPU-bound programs.

The GIL ensures:

only one thread executes Python bytecode at a time
threads switch at well-defined points
memory state stays consistent

It does not:

prevent I/O parallelism
block native code from running
affect multiprocessing

Why is GIL required?

Without a GIL, every single object operation would need locks.
CPython designers chose one global lock instead of many tiny blocks.
This dramatically simplifies:
- memory management
- garbage collection
- C-extension APIs
- interpreter correctness
Removing the GIL means building a complete new Python interpreter:
- rewriting memory management
- redesigning object model
- breaking C extensions
- slowing down single-thread code
- introducing subtle race bugs

*Other definitions

Refer to the following definitions when you encounter the terms while reading for better context:

Race Condition – A race condition happens when two or more threads or processes try to access and modify the same shared resource at the same time, and the final outcome depends on the order in which those operations happen. Since this order is unpredictable, the result becomes inconsistent, incorrect, and difficult to debug.

Example
Imagine you have a shared variable balance = 100.
Two threads are trying to withdraw 50 at the same time.

Both read the value as 100, both subtract 50, and both write back 50.
Logically, the balance should be 0, but you end up with 50. That is a race condition.

Why it happens
- Shared resource
- Multiple threads accessing it
- No proper coordination or locking

Result

Random output
Rare bugs
Inconsistent behavior

Event Loop – The event loop is the core of asyncio. It continuously runs in a single thread, schedules tasks, and switches between coroutines whenever they pause on an awaited operation. Instead of waiting idly, the event loop keeps other tasks moving, which enables concurrency without using multiple threads.
Coroutines – Coroutines are special functions defined with async def that support asynchronous execution. Instead of blocking, they pause using await while waiting for I/O or other asynchronous work, allowing the event loop to run other coroutines in the meantime.
Non-Blocking Operation – A non-blocking operation is an operation that does not stop execution while waiting for a result. Instead of freezing the program, it immediately returns control and resumes later when the result is ready. In asyncio, most I/O operations (like network calls, file operations, timers, etc.) are non-blocking, allowing other tasks to run during the wait time.

Models for concurrency

Multithreading

Used primarily for I/O-bound tasks, where threads spend most of their time waiting on OS I/O, not executing Python bytecode.
In such scenarios, multiple threads are used.
Each thread is responsible of a particular task. Each thread executes a task, but tasks may share data and state.
The thread releases the GIL only when it enters a blocking I/O operation implemented in C that explicitly releases the GIL.
Another runnable thread may acquire the GIL and continue executing Python bytecode.
Thus, while some threads are blocked on I/O, other threads can make progress, reducing idle CPU time.
All threads share:
- variables
- memory
- interpreter
- GIL
This sharing is why data corruption through *race conditions are possible.
Threads can manipulate shared global variables, which requires explicit synchronization to avoid race conditions.

Demo

Using Anilist API to demonstrate the time consumption comparison between syncronous and asynchronous approach to get output from 15 requests.

AnimeAPI:

import requests

class AnimeAPI:

    def __init__(self):
        self.query = '''
query ($id: Int) { # Define which variables will be used in the query (id)
  Media (id: $id, type: ANIME) { # Insert our variables into the query arguments (id) (type: ANIME is hard-coded in the query)
    id
    title {
      romaji
      english
      native
    }
  }
}
'''
        self.url = 'https://graphql.anilist.co'

    def response(self, anime_id:int):
        variables = {
            'id': anime_id
        }
        response = requests.post(self.url, json={'query': self.query, 'variables': variables})
        return response.json()

Multithreading execution:

from api import AnimeAPI
from time import perf_counter
from concurrent.futures import ThreadPoolExecutor, as_completed

anime = AnimeAPI()

anime_ids = [100,200,300,400, 700, 628, 524, 377, 826, 451, 280, 395, 399, 124, 626]


# Sync code #############################################################################

anime_dic = {}

sync_start = perf_counter()

for anime_id in anime_ids:
    try:
        anime_dic[anime_id] = anime.response(anime_id)
    except Exception as e:
        anime_dic[anime_id] = e

sync_end = perf_counter()

#####################################################################################

# Async code ############################################################################

anime_dic_async = {}

async_start = perf_counter()

with ThreadPoolExecutor(max_workers=5) as executor:
    futures = {executor.submit(anime.response, i): i for i in anime_ids}

    for future in as_completed(futures):

        anime = futures[future]

        try:
            data = future.result()
        except Exception as e:
            data = e

        anime_dic_async[anime] = data

async_end = perf_counter()

#####################################################################################

sync_time = sync_end - sync_start
async_time = async_end - async_start

print(f'''
Number of animes requested synchronously: {len(anime_ids)}
Number of outputs received synchronously: {len(anime_dic)}
Time taken for synchronous execution: {sync_time}

##############################################################

Number of animes requested asynchronously: {len(anime_ids)}
Number of outputs received asynchronously: {len(anime_dic_async)}
Time taken for asynchronous execution: {async_time}
''')

Output:

Number of animes requested synchronously: 15
Number of outputs received synchronously: 15
Time taken for synchronous execution: 10.28090550005436

##############################################################

Number of animes requested asynchronously: 15
Number of outputs received asynchronously: 15
Time taken for asynchronous execution: 3.0076786999125034

💡

Note the significant reduction in the execution time in concurrency.

Race condition demo:

from time import sleep
from threading import Thread

balance = 1000

def withdraw(amount: float):
    global balance
    temp = balance
    sleep(0.001)
    balance = temp - amount

threads = [Thread(target=withdraw, args=[50]) for _ in range(4)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(f"Expected: 800, Actual: {balance}")

Output:

Expected: 800, Actual: 950

Solving race condition using Lock:

from time import sleep
from threading import Thread, Lock, current_thread

balance = 1000

lock = Lock()

def withdraw(amount: float):
    global balance
    with lock:
        temp = balance
        print(f"Thread {current_thread()}: Reading balance = {temp}")
        sleep(0.001)
        balance = temp - amount
        print(f"Thread {current_thread}: New balance = {balance}")

threads = [Thread(target=withdraw, args=[50]) for _ in range(4)]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(f"Expected: 800, Actual: {balance}")

Output:

Thread : Reading balance = 1000
Thread <function current_thread at 0x000001758BD21BC0>: New balance = 950
Thread : Reading balance = 950
Thread <function current_thread at 0x000001758BD21BC0>: New balance = 900
Thread : Reading balance = 900
Thread <function current_thread at 0x000001758BD21BC0>: New balance = 850
Thread : Reading balance = 850
Thread <function current_thread at 0x000001758BD21BC0>: New balance = 800
Expected: 800, Actual: 800

Multiprocessing

Each process has its own threads and its own GIL, but processes do not share a GIL with each other.
When there are multiple CPU-bound tasks, they are distributed among different cores of the CPU, each handling the task in an isolated environment. Thus, multiple processes run on multiple cores.
The OS scheduler maps processes to cores, enabling true parallel execution of CPU-bound work.
Each process has their own set/copy of:
- variables
- memory
- interpreter
- GIL
Memory is not shared by default, and therefore is shared using serialization (pickling) costs.
Processes cannot directly manipulate the same global variable because they do not share memory space. Shared memory can be explicitly created, but it is not the default and must be managed carefully.
For orchestrating processes and building pipeline for data transfer, Queue is used. It is used to build multiprocessing pipelines. It lets producers push data and consumers retrieve it safely, without conflicts.

Demo

import os
from time import perf_counter
from concurrent.futures import ProcessPoolExecutor, as_completed


def calculate_factorial(start: int, end: int):
    result = 1
    for i in range(start, end + 1):
        result *= i
    # Adding extra computation to make it CPU-heavy
    for _ in range(2000000):
        result = (result % 1000000007) * 2
    return result


if __name__ == "__main__":
    processes = [(2,38), (5, 50), (68, 83), (14, 57),
                 (3,38), (6, 50), (69, 83), (15, 57),
                 (4,38), (7, 50), (70, 83), (16, 57),
                 (5,38), (8, 50), (25, 83), (17, 57),
                 (6,38), (9, 50), (26, 83), (18, 57)]
    cpu_count = os.cpu_count()
    cpu2use = int(0.5 * cpu_count)

    # Sync code #############################################################################

    results = {}

    sync_start = perf_counter()

    for process in processes:
        try:
            results[process] = calculate_factorial(*process)
        except Exception as e:
            results[process] = e

    sync_end = perf_counter()

    #####################################################################################

    # Async code ############################################################################

    results_async = {}

    async_start = perf_counter()

    with ProcessPoolExecutor(max_workers = cpu2use) as executor:
        futures = {executor.submit(calculate_factorial, *i): i for i in processes}

        for future in as_completed(futures):

            process_id = futures[future]

            try:
                data = future.result()
            except Exception as e:
                data = e

            results_async[process_id] = data

    async_end = perf_counter()

    #####################################################################################

    sync_time = sync_end - sync_start
    async_time = async_end - async_start

    print(f'''
Number of processes processed synchronously: {len(processes)}
Number of outputs received synchronously: {len(results)}
Time taken for synchronous execution: {sync_time}

##############################################################

Number of processes processed asynchronously: {len(processes)}
Number of outputs received asynchronously: {len(results_async)}
Time taken for asynchronous execution: {async_time}
''')

Output:

Number of processes processed synchronously: 20
Number of outputs received synchronously: 20
Time taken for synchronous execution: 3.7746968001592904

##############################################################

Number of processes processed asynchronously: 20
Number of outputs received asynchronously: 20
Time taken for asynchronous execution: 1.1230935999192297

💡

Note the significant reduction in the execution time in concurrency.

Queue demo

import os
import json
from multiprocessing import Process, Queue


def save_data_to_json(data:list):
    try:
        with open('data.json', 'r') as file:
            loaded_data = json.load(file)
            loaded_data += data
    except:
        loaded_data = data
    with open('data.json', 'w', encoding='utf-8') as json_file:
        json.dump(loaded_data, json_file, indent=4)
    print('Data saved.')


def producer(queue1:Queue, items:list):
    for item in items:
        print(f'Producing {item}')
        queue1.put(item)
    queue1.put(None)
    print('Producer done')


def transformer(queue1:Queue, queue2:Queue, factor:float):
    while True:
        item = queue1.get()
        if item is None:
            queue2.put(None)
            break
        transformed = factor * item
        print(f'Transforming {item} -> {transformed}')
        queue2.put(transformed)
    print('Transformer done')


def save(queue2:Queue):
    data = []
    while True:
        item = queue2.get()
        if item is None:
            break
        print(f'Getting {item}')
        data.append(item)
    save_data_to_json(data)


if __name__ == '__main__':
    os.remove('data.json')

    data = [2,4,6,3,5,7,9]

    queue1 = Queue()
    queue2 = Queue()

    processes = []

    processes.append(Process(target=producer, args=(queue1, data)))
    processes.append(Process(target=transformer, args=(queue1, queue2, 0.2)))
    processes.append(Process(target=save, args=(queue2,)))

    for process in processes:
        process.start()

    for process in processes:
        process.join()

    with open('data.json', 'r') as file:
        saved_data = json.load(file)

    print(saved_data)

Output:

Producing 2
Producing 4
Producing 6
Producing 3
Producing 5
Producing 7
Producing 9
Producer done
Transforming 2 -> 0.4
Transforming 4 -> 0.8
Transforming 6 -> 1.2000000000000002
Transforming 3 -> 0.6000000000000001
Transforming 5 -> 1.0
Transforming 7 -> 1.4000000000000001
Transforming 9 -> 1.8
Transformer done
Getting 0.4
Getting 0.8
Getting 1.2000000000000002
Getting 0.6000000000000001
Getting 1.0
Getting 1.4000000000000001
Getting 1.8
Data saved.
[0.4, 0.8, 1.2000000000000002, 0.6000000000000001, 1.0, 1.4000000000000001, 1.8]

Asyncio

Asyncio does not use multiprocessing. It also does not rely on multithreading by default, although it can use threads in specific situations, which we will discuss later.
Instead, asyncio achieves concurrency using an *event loop. The event loop runs multiple *coroutines together by scheduling them and switching between them whenever a coroutine performs a *non-blocking operation and awaits it, instead of blocking execution.
In simple terms, coroutines cooperatively yield control, allowing other coroutines to run during I/O waits, which reduces idle time.
The asyncio library provides the event loop, coroutines, tasks and futures that work together with the async and await syntax built into Python.
When we need to run a blocking operation inside an asyncio program, asyncio provides a way to execute it in a separate thread so the event loop doesn’t get blocked. Internally, asyncio submits these tasks to a ThreadPoolExecutor (similar to what we saw earlier in the multithreading example).
aiohttp is commonly used alongside asyncio as an asynchronous HTTP client and server framework. It is ideal for building RESTful APIs, handling a large number of concurrent network connections, and performing tasks like web scraping without blocking the event loop.

Demo

Asyncio Anime API:

import aiohttp
import asyncio


class AsyncAnimeAPI:

    def __init__(self):
        self.query = '''
query ($id: Int) {
  Media (id: $id, type: ANIME) {
    id
    title {
      romaji
      english
      native
    }
  }
}
'''
        self.url = 'https://graphql.anilist.co'

    async def response(self, anime_id: int):
        """Async method to fetch anime data"""
        variables = {
            'id': anime_id
        }

        try:

          async with aiohttp.ClientSession() as session:
              async with session.post(
                  self.url, 
                  json={'query': self.query, 'variables': variables}
              ) as response:

                  return await response.json()

        except asyncio.TimeoutError:
            return {'error': 'Timeout', 'anime_id': anime_id}
        except Exception as e:
            return {'error': str(e), 'anime_id': anime_id}

timedec decorator to calculate execution time:

from functools import wraps
import time


def timedec(operation_name:str):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            start = time.perf_counter()
            result = func(*args, **kwargs)
            end = time.perf_counter()
            print(f"Total time taken for {operation_name}: {end - start}.")
            return result
        return wrapper
    return decorator

Asyncio implementation:

import asyncio
from api import AnimeAPI
from asyncio_api import AsyncAnimeAPI
from timedecorator import timedec


@timedec('synchronous execution')
def sync_execution(anime_ids:list):

    anime_api = AnimeAPI()

    for anime_id in anime_ids:
        try:
            _ = anime_api.response(anime_id)
        except Exception as e:
            _ = e


@timedec('asynchronous execution')
async def async_execution(anime_ids:list):

    anime_api = AsyncAnimeAPI()

    tasks = [anime_api.response(idx) for idx in anime_ids]
    _ = await asyncio.gather(*tasks)


if __name__ == "__main__":
    anime_ids = [100,200,300,400, 700, 628, 524, 377, 826, 451, 280, 395, 399, 124, 626]
    sync_execution(anime_ids)
    asyncio.run(async_execution(anime_ids))

Output:

Total time taken for synchronous execution: 7.806094500003383.
Total time taken for asynchronous execution: 3.0999071896076202e-06.

When working with asyncio, blocking functions can freeze the event loop.
asyncio.to_thread() lets you run those blocking operations in a separate thread, so your async program remains responsive.
Demo:

from pprint import pprint
import asyncio
from api import AnimeAPI #Refer to this API code in 'Multithreading' section


async def get_response(api:AnimeAPI, anime_id:int):
    return await asyncio.to_thread(api.response, anime_id)


async def main():

    taskgroup_anime_ids = [100,200,300,400,600]

    anime_api = AnimeAPI()

    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(get_response(anime_api, idx)) for idx in taskgroup_anime_ids]

    results = [task.result() for task in tasks]

    pprint(results)


if __name__ == "__main__":
    asyncio.run(main())

[{'data': {'Media': {'id': 100,
                     'title': {'english': 'Prétear: The New Legend of Snow '
                                          'White',
                               'native': '新白雪姫伝説プリーティア',
                               'romaji': 'Shin Shirayuki-hime Densetsu '
                                         'Pretear'}}}},
 {'data': {'Media': {'id': 200,
                     'title': {'english': None,
                               'native': '天使な小生意気',
                               'romaji': 'Tenshi na Konamaiki'}}}},
 {'data': {'Media': {'id': 300,
                     'title': {'english': '3x3 Eyes',
                               'native': '3×3EYES',
                               'romaji': '3x3 EYES'}}}},
 {'data': {'Media': {'id': 400,
                     'title': {'english': 'Outlaw Star',
                               'native': '星方武侠アウトロースター',
                               'romaji': 'Seihou Bukyou Outlaw Star'}}}},
 {'data': {'Media': {'id': 600,
                     'title': {'english': None,
                               'native': 'レジェンドオブ・デュオ',
                               'romaji': 'Legend of Duo'}}}}]

TaskGroup:
asyncio.TaskGroup provides structured concurrency in Python. Instead of manually creating and managing tasks, a TaskGroup groups related asynchronous operations together and guarantees that they are all tracked, awaited, and cleaned up safely.
When you create tasks inside a TaskGroup, all of them run concurrently, and the block does not exit until every task completes. If any task raises an exception, the TaskGroup automatically cancels the remaining tasks and propagates the error in a predictable way. This prevents “orphan” background tasks, missing results, and silent failures that commonly occur when managing tasks manually with create_task().
In simple terms, TaskGroup makes asynchronous code safer and more reliable by enforcing lifecycle management for tasks, so you do not have to do it yourself.

gather vs TaskGroup:
asyncio.gather() runs multiple coroutines and waits for results, but error handling can be messy and tasks may survive in weird states.
TaskGroup provides structured concurrency: tasks belong to a group, errors are handled predictably, and no task is accidentally left running in the background.

from pprint import pprint
import asyncio
from asyncio_api import AsyncAnimeAPI


async def main():

    taskgroup_anime_ids = [100,200,300,400]
    independent_anime_ids = [600,853]

    anime_api = AsyncAnimeAPI()

    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(anime_api.response(idx)) for idx in taskgroup_anime_ids]

    results = [task.result() for task in tasks]

    tasks2 = [asyncio.create_task(anime_api.response(idx)) for idx in independent_anime_ids]

    results.extend(await asyncio.gather(*tasks2))

    pprint(results)


if __name__ == "__main__":
    asyncio.run(main())

[{'data': {'Media': {'id': 100,
                     'title': {'english': 'Prétear: The New Legend of Snow '
                                          'White',
                               'native': '新白雪姫伝説プリーティア',
                               'romaji': 'Shin Shirayuki-hime Densetsu '
                                         'Pretear'}}}},
 {'data': {'Media': {'id': 200,
                     'title': {'english': None,
                               'native': '天使な小生意気',
                               'romaji': 'Tenshi na Konamaiki'}}}},
 {'data': {'Media': {'id': 300,
                     'title': {'english': '3x3 Eyes',
                               'native': '3×3EYES',
                               'romaji': '3x3 EYES'}}}},
 {'data': {'Media': {'id': 400,
                     'title': {'english': 'Outlaw Star',
                               'native': '星方武侠アウトロースター',
                               'romaji': 'Seihou Bukyou Outlaw Star'}}}},
 {'data': {'Media': {'id': 600,
                     'title': {'english': None,
                               'native': 'レジェンドオブ・デュオ',
                               'romaji': 'Legend of Duo'}}}},
 {'data': {'Media': {'id': 853,
                     'title': {'english': 'Ouran High School Host Club',
                               'native': '桜蘭高校ホスト部',
                               'romaji': 'Ouran Koukou Host Club'}}}}]

Some more useful concepts in asyncio you can refer to:

Semaphore
A semaphore is used to limit how many coroutines can run a specific piece of code at the same time. This is useful when hitting APIs with rate limits, restricting database connections, or controlling access to limited resources.
You acquire it before running a task and release it when done. If the limit is reached, other coroutines wait.
In short: Semaphore = controlled concurrency instead of unlimited concurrency.

AnimeAPI integrated with Semaphore:

import asyncio
from asyncio_api import AsyncAnimeAPI


class SemaphoreDemo(AsyncAnimeAPI):

    def __init__(self, sem: asyncio.Semaphore):
        super().__init__()
        self.sem = sem

    async def worker(self, anime_id: int):
        print(f"Task {anime_id}: Waiting for semaphore...")

        async with self.sem:
            print(f"Task {anime_id}: Acquired semaphore, starting request")
            result = await self.response(anime_id)
            print(f"Task {anime_id}: Request completed")

        print(f"Task {anime_id}: Released semaphore")
        return result

Implementing Semaphore with value 2:
This will allow only 2 operations to run concurrently at a time.

import asyncio
from sem import SemaphoreDemo


async def main():
    semaphore = asyncio.Semaphore(2)
    demo = SemaphoreDemo(semaphore)

    anime_ids = [100,200,300,400]

    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(demo.worker(idx)) for idx in anime_ids]

    _ = [task.result() for task in tasks]


if __name__ == '__main__':
    asyncio.run(main())

Output:

Task 100: Waiting for semaphore...
Task 100: Acquired semaphore, starting request
Task 200: Waiting for semaphore...
Task 200: Acquired semaphore, starting request
Task 300: Waiting for semaphore...
Task 400: Waiting for semaphore...
Task 200: Request completed
Task 200: Released semaphore
Task 300: Acquired semaphore, starting request
Task 100: Request completed
Task 100: Released semaphore
Task 400: Acquired semaphore, starting request
Task 300: Request completed
Task 300: Released semaphore
Task 400: Request completed
Task 400: Released semaphore

💡

The above output shows that how Semaphore ensures only 2 operations at a time as configured.

Lock
A lock ensures that only one coroutine accesses a shared resource at a time. This prevents race conditions when multiple coroutines try to modify shared state.

Event
An event is a signaling mechanism. One coroutine can set an event, and others waiting on it will resume. Useful for coordination between tasks.

Queue
asyncio.Queue is designed for asynchronous producer–consumer pipelines. Producers put items in the queue, consumers await them. It provides built-in backpressure and prevents uncontrolled task growth.

Cancellation & Timeouts
Asyncio supports cooperative cancellation. Tasks should be written to handle cancellation cleanly. asyncio.wait_for() or timeouts on APIs ensure long-running tasks don’t freeze the system.

Backpressure and Flow Control
Asyncio doesn’t magically solve overload problems. Use queues, semaphores, and proper design to avoid overwhelming external systems or your own application.

Comparison table for the concurrency models

Aspect	Threading	Multiprocessing	Asyncio
Best For	I/O-bound tasks using blocking libraries	CPU-bound tasks needing true parallelism	Massive I/O concurrency using async APIs
Execution Model	Multiple OS threads in one process	Multiple independent processes	Single thread, event loop, cooperative multitasking
GIL Impact	Affected by GIL → only one thread executes Python bytecode at a time	Not affected (each process has its own interpreter & GIL)	Not affected in typical usage; only one coroutine runs at a time by design
Parallel CPU Execution	❌ No real parallelism for Python code	✅ True parallelism across CPU cores	❌ No (unless explicitly offloading work)
I/O Handling	Works well because many I/O ops release GIL internally	Works but heavyweight for I/O	Best for non-blocking I/O (`await`)
Scalability	Moderate (dozens to hundreds of threads)	Good but expensive (RAM + process startup)	Excellent (thousands of concurrent tasks)
Memory Usage	Low to moderate	High (separate memory per process)	Very low
Complexity	Simple mental model	Moderate; requires IPC and data serialization	Higher learning curve; requires async-aware ecosystem
Good Use Cases	File I/O, blocking network calls, background workers, GUI apps	Data processing, ML workloads, CPU crunching, parallel computation	APIs, web servers, scraping at scale, chat apps, streaming
Bad Use Cases	CPU-bound workloads	Extremely lightweight tasks that don’t justify process overhead	CPU-bound work or blocking libraries
Typical Tools	`threading`, `ThreadPoolExecutor`	`multiprocessing`, `ProcessPoolExecutor`	`asyncio`, `aiohttp`, async DB drivers
Failure Mode if Misused	Thread explosion, overhead	High overhead, complex debugging, shared state pain	Event loop freezes if blocking code sneaks in

Quick rule of thumb choose concurrency model for my task

I/O + blocking libs + manageable concurrency → Threading
CPU-bound, need real parallel speedup → Multiprocessing
Huge I/O concurrency + async ecosystem available → Asyncio

Zenalyze: My AI-Assisted Data Analysis Tool (And Why I Built It)

Tuhin Kumar Dutta — Mon, 17 Nov 2025 16:27:40 GMT

Most AI “data analysis” tools today fall into two groups:

They pretend to analyze your data but don’t actually run code.
They demand you upload your data to some cloud black box.

Neither works for real-world analytics.

I wanted something different.
Something that could sit right in my local environment, understand my tables, generate real Python code, execute it, and help me explore data the same way an actual teammate would.

That’s where Zenalyze came from — a lightweight package that turns LLMs into a practical coding partner without ever exposing your actual data values.

GitHub Package Documentation

Let me walk you through the motivation, design thinking, and how it fits into a real workflow.

🧩 The Problem I Wanted to Solve

Anyone working with Pandas or PySpark knows the cycle:

Load data
Look at shapes, missing values, weird fields
Write a bunch of boilerplate
Rinse and repeat for every analysis step

And every time you want to try something new, you end up rewriting the same code:

df.groupby(...).agg(...)

df.merge(...)

df.plot(...)

I wanted a tool that handled this repetitive side of analysis, while still letting me remain in control of the code. Something that generates real Python, runs in my own environment, and behaves predictably.

🎯 The Motivation Behind Zenalyze

A few core ideas shaped the project:

1. LLMs should help you code, not replace your environment

I didn’t want a chatbot that tells me what could work.
I wanted a companion that writes actual code I can run right away.

2. Your data never leaves your machine

If you’re analyzing customer revenue, fraud records, supply chain data, medical outcomes — the last thing you want is your rows flying off into the internet.

Zenalyze only sends metadata, not data.

3. History-aware analysis

LLMs forget.
Data analysts don’t have time to babysit them.

Zenalyze:

tracks every step
remembers derived columns
summarizes past actions
reuses existing variables
never re-imports Pandas/Spark unnecessarily

So the conversation stays consistent, and the code becomes cleaner over time.

4. Make the experience fun

I didn’t want another “heavy enterprise tool”.
Just a friendly, intelligent coding buddy in my notebook.

🔐 Security: Close to Your Data, Never Inside It

One thing I was very firm about:
Zenalyze should never see raw data.

And it doesn’t.

It only extracts and uses:

column names
descriptions
data types
row/column counts
null percentages
high-level distributions
patterns
derived fields created in earlier steps

This is enough to provide context for the LLM to generate correct code, but not enough to reveal anything sensitive.

Think of it as letting someone read your database schema without giving them access to the rows.

But use it responsibly.

Even though Zenalyze never touches actual records, good practice is to run it in an isolated and monitored environment:

Jupyter inside a virtual environment
Controlled outbound/inbound network rules
No access to production systems
Zero trust toward external LLMs you didn’t configure

Because yes — while Zenalyze won’t misbehave on its own, a malicious or rogue LLM can try to generate harmful code.
Not likely unless someone purposely built an LLM for chaos, but still worth mentioning.

Smart tools deserve smart environments.

🤝 What Zenalyze Actually Does

When you interact with it:

zen.do("calculate total revenue per customer")

It does a few things behind the scenes:

builds a detailed prompt with metadata
injects the correct dataset references
generates the Python code
executes the code right in your environment
saves the result as a variable
remembers what you just did
lets you ask follow-up questions through the buddy:

zen.buddy("Explain what we did in the last step")

It’s smooth, predictable, and feels like working with a junior analyst who never gets tired.

🧘 Why It's Called Zenalyze

Because the tool’s job is to take the chaotic part of exploratory analysis — the constant back-and-forth, the rewriting, the checking, the clutter — and make it calm, clean, and focused.

Data work shouldn’t feel like fighting your tools.
It should feel like thinking clearly.

That’s the vibe.

🛠️ Setup & Environment Notes

Before we get into the demo, a few practical reminders:

Always use a .env file for API keys
Keep your environment isolated (venv/conda)
Monitor outbound connections
Use secure LLM providers you trust
Keep datasets local or on controlled Spark clusters

Zenalyze integrates tightly with Pandas and PySpark, so as long as your environment is tidy, the experience will be clean.

📦 Installation

Once the package is on PyPI:

pip install zenalyze

Or directly from GitHub:

pip install git+https://github.com/tuhindutta/Zenalyze.git

🚀 Demo Time

Now let’s actually use Zenalyze and see what it feels like in a real environment.

Don’t worry — this part is straightforward. No complicated infra, no scary configs.
Just a clean Python setup and a couple of environment variables.

1️⃣ Create a Virtual Environment

Always start in a clean workspace. It keeps things tidy and avoids package mess.

python -m venv .venv
source .venv/bin/activate      # Mac/Linux
# or
.\.venv\Scripts\activate       # Windows

You should now see (.venv) in your terminal prompt.

2️⃣ Install Zenalyze

If you installed from GitHub:

pip install git+https://github.com/tuhindutta/Zenalyze.git

Once it's on PyPI, you'll switch to:

pip install zenalyze

3️⃣ Add Your Environment Variables

Zenalyze uses three environment variables.
You can put them in a .env file, export them directly, or load them through your preferred method.

Required / Optional Env Variables

Variable	Purpose	Default
`MODEL`	Main LLM for code generation	`openai/gpt-oss-120b`
`GROQ_API_KEY`	API key for your LLM provider	none (must provide if using Groq)
`BUDDY_MODEL`	LLM for natural-language buddy responses	`openai/gpt-oss-120b`
`CODE_SUMMARIZER_MODEL`	LLM for summarizing long code histories	`openai/gpt-oss-120b`

Example .env file

Create a file named .env in your project folder:

MODEL=openai/gpt-oss-120b
BUDDY_MODEL=openai/gpt-oss-120b
CODE_SUMMARIZER_MODEL=openai/gpt-oss-120b

GROQ_API_KEY=your_groq_key_here

Load it using python-dotenv (optional but convenient)

pip install python-dotenv

In a notebook or script:

from dotenv import load_dotenv
load_dotenv()

And you’re good to go.

4️⃣Prepare Your Data Folder (and Optional Description File)

Let’s set up the data Zenalyze will work with.

Start by creating a simple ./data directory and drop in a few CSV or Excel files.

Example structure:

project/
 ├── .env
 ├── demo.ipynb
 └── data/
       ├── customers.csv
       ├── orders.csv
       └── desc.json             // optional but highly recommended (discussed below)

Zenalyze will automatically scan this folder, load the files, and extract metadata like column names, dtypes, null percentages, and patterns.
That’s enough for it to start generating clean, context-aware analysis code.

⭐ Optional but Highly Recommended: Add a `desc.json` File

If you want Zenalyze to understand what your tables actually represent rather than just their structure, you can provide a desc.json file in the working directory.

This file lets you describe, in your own words:

what each table means
business/domain context
what each column represents
any notes you'd want an analyst to know

There’s no strict formatting rule — you can phrase descriptions however you prefer.
The only requirement is that the top-level keys match your table names without file extensions.

For example:

{
    "customers": {
        "data_desc": "customer master table",
        "columns_desc": {
            "customer_id": "unique customer identifier",
            "region": "geographical region"
        }
    },

    "orders": {
        "data_desc": "transaction-level order data",
        "columns_desc": {
            "order_id": "unique id for each order",
            "amount": "order total value"
        }
    }
}

Name this file exactly as desc.json and place inside data/.

🤝 Don’t Want to Write It Manually?

Zenalyze can generate a template for you.

Once Zenalyze is initialized, just run:

zen.create_description_template_file(forced=True)

This will create a desc.json template file inside the appropriate destination — you only need to fill in the details and reinitialize the Zenalyze instance.

5️⃣ Initialize Zenalyze

Fire up Jupyter Notebook and inside your notebook:

from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist

zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), "./data")

This does a lot for you:

loads your datasets
extracts metadata
sets up history retention
configures the LLM models
prepares your analysis environment

You'll now have variables like customers, orders, etc. injected into your session automatically.

Demo Notebook

🎁 Final Thoughts

Zenalyze isn’t meant to be another giant enterprise tool with a 100-page manual.

It’s meant to be:

simple
lightweight
developer-friendly
safe
genuinely helpful

If it makes data exploration even a little bit smoother, cleaner, or more fun — it’s doing its job.

And this is only the beginning.

Machine Learning Experimentation with Airflow

Tuhin Kumar Dutta — Tue, 04 Nov 2025 03:29:03 GMT

Problem Statement

While experimenting with machine learning models — tuning hyperparameters with Bayesian methods, running cross-validations, and optimizing trials using Optuna with MLflow tracking — I found myself constantly fighting the same problem: code dependency and fragility.

Every new experiment required changing code, updating configurations, and risking breakages. Switching datasets or parameters meant tweaking multiple scripts, rerunning environment setups, and occasionally debugging things that weren’t even related to the experiment.

It became clear that I needed a stable, self-contained environment that could:

Handle ETL and ML pipeline orchestration automatically.
Let me experiment safely without touching core code.
Log and version results in MLflow.
Be reproducible across systems — from my laptop to a CI/CD server.

That realization drove the motivation behind this framework — an Airflow-based, Jenkins-triggered, Docker-deployed system designed to give me a ready-to-use experimentation environment.

Motivation

The idea wasn’t just to automate; it was to decouple experimentation from infrastructure. What I wanted instead was a repeatable and isolated environment where I could:

Spin up an ETL + training pipeline without touching the core code.
Experiment safely using parameters or configurations, not manual edits.
Automatically log, register, and version models via MLflow.
Keep the entire system portable and rebuildable — same behavior on any machine or CI runner.
Experimentation could move from “code edits” to config and parameter-based triggers.

Essentially, I wanted the luxury of iteration speed without the anxiety of setup.

That’s when I started designing an Airflow-based framework that could self-deploy via Jenkins, using Docker Compose for orchestration and Nexus for controlled dependency management. That led to building a self-contained Airflow framework, designed to:

Launch via Jenkins CI/CD with a single trigger.
Use Docker Compose to orchestrate an Airflow Celery cluster for distributed tasks.
Dynamically install custom Python dependencies from a private Nexus repo, without editing the Docker image manually.
Provide a ready-to-run sandbox for model training, ETL pipelines, and experiment tracking.

Implementation Overview

For those interested in the complete technical setup — including Dockerfile, Jenkins pipeline, and Docker Compose configurations — I’ve documented everything in detail here:
🔗 Airflow Celery Framework Documentation

Below, I’ll focus on how the system works internally — the practical workflow, integration points, and reasoning behind some of the implementation choices.

1. The Jenkins CI/CD Pipeline — The Automation Backbone

Jenkins is the central automation layer.
It eliminates the need for manual Docker builds or direct command-line work.
Instead, a user (or a scheduled trigger) starts a build with configurable parameters such as:

NEXUS_URL → private PyPI/Nexus repository URL
NEXUS_CREDS_ID → Jenkins credentials ID for Nexus authentication
DEV_DIR → target build directory for staging
REQUIREMENTS & CUSTOM_REQUIREMENTS → dependency lists to pull at runtime

Once triggered, the pipeline:

Cleans the workspace and checks out the repo.
Prepares the environment, creating the build directory and injecting Nexus credentials as Docker BuildKit secrets.
Fetches dependency files dynamically (curling them from URLs provided in the parameters).
Builds the Airflow image using those dependencies and secrets.
Runs docker compose up -d to bring up all Airflow services.
Cleans sensitive files (Nexus creds, .env, Dockerfile, compose YAML) to keep the environment safe.

This approach ensures every build is fresh, reproducible, and isolated, without needing to manually rebuild or edit Dockerfiles.

2. Docker Image Design — Parameterized and Secure

The Dockerfile extends the official Airflow image (apache/airflow:3.1.0) and is designed to be parameterized rather than static.

It introduces:

ARG INDEX_URL and ENV PYPI_URL for flexible dependency sources.
BuildKit secret mounts (nexus_user, nexus_pass) to inject credentials securely.
Multi-layer installs for clean separation between public and private dependencies.

This design means you can:

Swap dependency sets without modifying the Dockerfile.
Point to different Nexus repositories across environments (dev, staging, prod).
Rebuild instantly from Jenkins with zero code edits.

It’s a true “define once, reuse everywhere” model.

3. Dependency Management via Nexus

Instead of pushing all dependencies to PyPI or including them in the repo, private packages are hosted in Nexus.

Here’s how the flow works:

Jenkins reads NEXUS_CREDS_ID and exposes username/password as Docker secrets.
During the build, Docker mounts these credentials temporarily at /run/secrets.
Pip installs private dependencies using the provided INDEX_URL (from Nexus).
The credentials vanish after build completion — never written to image layers or logs.

This method is both secure and scalable, enabling enterprise-style dependency control with zero manual interference.

4. Execution Lifecycle Summary

Here’s what a full run looks like:

[1] Jenkins Job Triggered (manual or scheduled)
     ↓
[2] Parameters read → environment prepared
     ↓
[3] Nexus credentials injected securely
     ↓
[4] Docker build starts with secrets + dependency files
     ↓
[5] Custom Airflow image built dynamically
     ↓
[6] docker-compose up -d (Airflow + Redis + Postgres + Flower)
     ↓
[7] Secrets & temp files cleaned
     ↓
[8] Airflow UI accessible → ready for DAGs and experiments

5. Design Priorities

The system was built with three guiding principles:

Isolation — every environment is self-contained and disposable.
Reproducibility — build once, deploy anywhere, get the same behavior.
Security — credentials never persist, even in intermediate Docker layers.

These principles make it flexible enough for both individual experiments and team-scale deployments.

How I Used the Framework

After building the Airflow–Jenkins–Docker setup, I wanted to validate it with an actual end-to-end ML project. The goal was to see if this framework could handle real experimentation, versioning, and deployment workflows — not just spin up containers.

1. Building and Versioning the Project Package

I started locally with a project that handled ETL and model training, fully integrated with MLflow for experiment tracking.
Instead of running it as loose scripts, I packaged the entire project into a Python wheel (.whl) using setuptools.

To automate this:

I created a CI/CD pipeline in Jenkins dedicated to building, versioning, and publishing this wheel.
Every run of the pipeline created a new, versioned artifact (e.g., project_name-0.1.4-py3-none-any.whl).
The wheel, along with all its dependencies, was uploaded to my private Nexus repository, making it accessible like any other PyPI package.

For Airflow to access it during runtime, I exposed the Nexus repository securely using ngrok, which allowed local or private-network access from the containerized environment.

2. Integrating with the Airflow Deployment Framework

Once the package was available in Nexus, I used another CI/CD pipeline — the one based on my Airflow Celery framework — to automatically:

Pull the wheel from Nexus along with any additional dependencies,
Build the custom Airflow image through the Dockerfile that installs those dependencies dynamically, and
Bring up the entire Airflow environment via docker compose up -d.

With this, I now had a fully operational and reproducible Airflow setup — built, configured, and ready with just a few clicks.

3. Running Training and Deployment Pipelines

Next, I wrote two DAGs to test the framework’s integration capabilities.

Training DAG:
- Loads configuration files specifying hyperparameters.
- Runs ETL and training steps using the packaged project wheel.
- Tracks experiments, metrics, and models using MLflow.
- Results and artifacts (models, metrics, plots, etc.) appear automatically in the MLflow UI.
Deployment DAG:
- Fetches the required model artifact from a provided MLflow URI.
- Handles deployment logic — either pushing to cloud, an endpoint, or a designated inference environment.

These two DAGs validated the entire pipeline: from data processing → experiment tracking → artifact management → deployment handoff.

4. The Outcome

By combining these pipelines, I ended up with a completely automated, reproducible ML experimentation and deployment framework:

Jenkins handles build, versioning, and deployment triggers.
Docker + Airflow provide a consistent execution environment.
Nexus acts as the private dependency registry.
MLflow manages experiment tracking and model artifacts.

The best part?
It’s modular — I can plug in any ML project following the same structure and get a working Airflow environment with versioned dependencies and clean experiment tracking in minutes.

What’s Next

The next part of this series will focus on the actual project and deployment architecture — how the training pipeline was structured, how model promotion and validation were handled, and how deployment was automated in the cloud.

For now, this post covers the framework, setup, and environment that made all of that possible.

Testing the Nutriguide API

Tuhin Kumar Dutta — Sat, 30 Aug 2025 14:15:56 GMT

Introduction

An API uses nutritional data from various products and a user's health status to create a chatbot powered by LLM. Users can interact with this chatbot to make and plan their choices.

The GitHub Documentation and DockerHub documentation provide guides for downloading the container and using it locally. You can also deploy it by following a few simple steps. For more details on using the API, refer to the API documentation in the GitHub documentation.

Testing

Install Docker Desktop.
Install Postman.
Follow the user guide in the DockerHub documentation.
Create 3 instances of POST requests for these endpoints:
- /health
- /products
- /query

Send the health status through the /health endpoint. For this example, let’s use:

 {
     "health_status": "I am hypertensive patient with a slight high BP of 130/92. I have diabetis around 135. I am 5 feet 8 inches tall with 78 kg of weight. I want protien."
 }

Send the nutritional information of the products through the /products endpoint. For this example, let’s use:

 {
     "products": [
         {
             "data": {
                 "Nutritional Content": {
                     "Added Sugars": "40.5 g",
                     "Carbohydrates": "56.5 g",
                     "Cholesterol": "13.7 mg",
                     "Energy": "559 kcal",
                     "Protein": "6.1 g",
                     "Saturated Fat": "21.15 g",
                     "Sodium": "212 mg",
                     "Total Fat": "34.85 g",
                     "Total Sugars": "48.6 g",
                     "Trans Fat": "0.15 g"
                 }
             },
             "heading": "Nutritional Content of Cadbury Dairy Milk Silk Oreo per 100g"
         },
         {
             "data": {
                 "Nutritional Content": {
                     "Added Sugars": "0 g",
                     "Carbohydrates": "53 g",
                     "Cholesterol": "0 mg",
                     "Energy": "536 kcal",
                     "Protein": "7 g",
                     "Saturated Fat": "3.3 g",
                     "Sodium": "525 mg",
                     "Total Fat": "34 g",
                     "Total Sugars": "1 g",
                     "Trans Fat": "0.1 g"
                 }
             },
             "heading": "Nutritional Content of Lays Classic Salted Chips per 100g"
         },
         {
             "data": {
                 "Nutritional Content": {
                     "Added Sugars": "10.6 g",
                     "Carbohydrates": "10.6 g",
                     "Cholesterol": "0 mg",
                     "Energy": "42 kcal",
                     "Protein": "0 g",
                     "Saturated Fat": "0 g",
                     "Sodium": "11 mg",
                     "Total Fat": "0 g",
                     "Total Sugars": "10.6 g",
                     "Trans Fat": "0 g"
                 }
             },
             "heading": "Nutritional Content of Coca-Cola per 100ml"
         },
         {
             "data": {
                 "Nutritional Content": {
                     "Added Sugars": "2.1 g",
                     "Carbohydrates": "60 g",
                     "Cholesterol": "0 mg",
                     "Energy": "445 kcal",
                     "Protein": "9 g",
                     "Saturated Fat": "8.1 g",
                     "Sodium": "880 mg",
                     "Total Fat": "17 g",
                     "Total Sugars": "3 g",
                     "Trans Fat": "0.1 g"
                 }
             },
             "heading": "Nutritional Content of Maggi 2-Minute Noodles per 100g"
         },
         {
             "data": {
                 "Nutritional Content": {
                     "Added Sugars": "0 g",
                     "Carbohydrates": "0 g",
                     "Cholesterol": "215 mg",
                     "Energy": "717 kcal",
                     "Protein": "0.9 g",
                     "Saturated Fat": "51 g",
                     "Sodium": "714 mg",
                     "Total Fat": "81 g",
                     "Total Sugars": "0 g",
                     "Trans Fat": "3 g"
                 }
             },
             "heading": "Nutritional Content of Amul Butter per 100g"
         }
     ]
 }

Now, we can ask questions using the /query endpoint as shown below, with all chat history stored and tracked for context awareness:
```
 {
     "query": "What should I take for protien?"
 }
```

A conversation session example

Q1— What should I take for protien?

Here are the protein contents of the given foods:

Cadbury Dairy Milk Silk Oreo - 6.1 g

Lays Classic Salted Chips - 7 g

Coca-Cola - 0 g

Maggi 2-Minute Noodles - 9 g

Amul Butter - 0.9 g

To meet your protein needs, consider options like Maggi 2-Minute Noodles, Lays Classic Salted Chips, or Cadbury Dairy Milk Silk Oreo. However, please keep in mind your hypertension and diabetes; I will provide more tailored suggestions if needed.

Q2— Is it okay to have it in terms of fat content?

Considering your hypertension and diabetes, let's evaluate the fat content of the high-protein options:

Maggi2-Minute Noodles - 17 g (Total Fat), 8.1 g (Saturated Fat)

Lays Classic Salted Chips - 34 g (Total Fat), 3.3 g (Saturated Fat)

Cadbury Dairy Milk Silk Oreo - 34.85 g (Total Fat), 21.15 g (Saturated Fat)

To manage your condition, it's essential to balance your fat intake. Lays Classic Salted Chips and Cadbury Dairy Milk Silk Oreo have high fat content. Maggi2-Minute Noodles seem to be a relatively better option, but its sodium content (880 mg) is a concern for hypertension.

Considering both protein and fat content, Maggi2-Minute Noodles might be a compromise, but please be mindful of its sodium and overall carbohydrate content (60 g) that may impact your diabetes.

Q3— How much should I consume your recommendation and coke?

I can't recommend consuming Coca-Cola due to its high sugar content (10.6 g per 100ml), which can worsen your diabetes.

For Maggi2-Minute Noodles, considering your hypertension, diabetes, and protein needs, I suggest a moderate serving size. A 100g serving of Maggi2-Minute Noodles provides 9g of protein, 17g of fat, 60g of carbohydrates, and 880mg of sodium.

To balance your nutrient intake, limit your serving size to 50-70g (approximately half to two-thirds of a standard packet), and consider adding vegetables or protein sources with lower sodium and fat content to make it a more balanced meal.

Please consult a healthcare professional or registered dietitian for personalized dietary advice.

Q4— BTW which one is the least harmful?

Considering your hypertension and diabetes, I would suggest Lays Classic Salted Chips as the least harmful option among the ones listed.

It provides 7g of protein and has relatively lower sodium (525mg) compared to Maggi2-Minute Noodles. However, please be mindful of its high fat content (34g) and limit your serving size.

Keep in mind that it's essential to balance your diet with a variety of nutrient-dense foods and consult a healthcare professional for personalized advice.

Future Prospects

Testing Smart Nutrition API

Tuhin Kumar Dutta — Tue, 26 Aug 2025 18:53:10 GMT

Introduction

A smart nutrition API that fetches the nutritional information from pictures of labelled products. As we will see next, the API returns somewhat close nutritional metrics compared to the actual labels at around 3.0% error rate.

The technical details are mentioned in the GitHub documentation and with the docker image and instructions present in the DockerHub. There is also a comprehensive and experimental guide to setup Kubernetes cluster and deploy the API container in a pod.

Following are minimalistic examples of development.yaml and service.yaml for K8s deployment:

# development.yaml - fill the following required API keys as env variable.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nutrition-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nutrition-api
  template:
    metadata:
      labels:
        app: nutrition-api
    spec:
      containers:
      - name: nutrition-api
        image: tkdutta/smart-nutrition-api:v1
        ports:
        - containerPort: 5000
        env:
        - name: GOOGLE_API_KEY
          value: ""
        - name: GROQ_API_KEY
          value: ""

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nutrition-api-service
spec:
  type: NodePort
  selector:
    app: nutrition-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
      nodePort: 30007

Short description of the endpoints

Here is a simple refresher of the endpoints of the API (with more details is the baove mentioned documentations):

/product-name: Extract product name from an uploaded image.
- Payload: Form-data with key “product” as the image file.
- Returns: json file with the extracted product name.
/nutrition: Get nutritional information for a given product name. Returns a json file.
- Payload: Value from the /product-name output.
- Returns: json file with it’s nutritional values.

Using the API

Passing the following image in the /product-name endpoint.
We get the following json output based on the above image provided:
```
 {
     "name": "dairymilk silk oreo"
 }
```

Passing the above product name in the /nutrition endpoint to get the following output:

 {
     "data": {
         "Nutritional Content": {
             "Added Sugars": "40.5 g",
             "Carbohydrates": "56.5 g",
             "Cholesterol": "13.7 mg",
             "Energy": "559 kcal",
             "Protein": "6.1 g",
             "Saturated Fat": "21.15 g",
             "Sodium": "212 mg",
             "Total Fat": "34.85 g",
             "Total Sugars": "48.6 g",
             "Trans Fat": "0.15 g"
         }
     },
     "heading": "Nutritional Content of Cadbury Dairy Milk Silk Oreo per 100g"
 }

Validating the output

Let’s validate the output with the actual label in the product:

Nutritional contents	API output (per 100 g)	Label (per 100 g)	Error %
Energy	559.0 kcal	563.0 kcal	0.7 %
Carbohydrates	56.5 g	56.9 g	0.7 %
Protien	6.1 g	6.0 g	-1.67 %
Sugars	48.6 g	48.9 g	0.6 %
Total Fat	34.85 g	34.8 g	-0.14 %
Saturated Fat	21.15 g	22.3 g	0.67 %
Sodium	212.0 mg	207.0 mg	-2.42 %
Cholesterol	13.7 mg	17.4 mg	21.26 %

Therefore, the Mean Absolute Percentage Error (MAPE) of the above is ~3.8%

Performing the same analysis for multiple products across various brands give an average error rate of ~3.0%

Governance, Trust & Disclaimer

Nutrition data is approximate and may differ from actual food content. Use as guidance, not as medical advice.
Not a medical-grade app. For general use only and is not to be considered as ultimate source of truth without proper research.
Data is extracted through general internet scraping.
To be treated like a navigation app - great for guidance, but not a substitute for a doctor’s prescription or lab-tested food assay.

Future Prospects

Minikube Cluster Setup

Tuhin Kumar Dutta — Sun, 27 Jul 2025 19:11:42 GMT

The following guide specifically focusses on the setup of Linux (Ubuntu Server) guest in Windows 11 host.

Install and configure Oracle VirtualBox.
Donload Ubuntu Server ISO.
Install the Ubuntu Server ISO in VirtualBox with the following system requirements of the VDI created:
- Minimum Disk space: Minimum 25 GB
- Minimum Memory: 4 GB
- Minimum number of CPU cores: 2
- Network: NAT / Bridged Adapter (Recommended)
For NAT netwrok:

Click on ‘Port Forwarding’ button and configure with the following details. This is required to perform ssh from Windows terminal.

Name	Protocol	Host IP	Host Port	Guest IP	Guest Port
ssh	TCP		3022		22

For Bridged Adapter network, find the IP using the following command in the guest machine.

ip a

Open and log into the VDI and run the following commands:

 # Update the OS
 sudo apt update && sudo apt upgrade

 # Install openssh-server
 sudo apt install openssh-server

 # Enable the ssh service using systemctl
 sudo systemctl enable ssh --now

 # Check if the service is enabled
 sudo systemctl status ssh

If the ssh service is enabled, following message is displayed:

Now we can perform ssh from the host terminal using the following command and use the VDI with a somewhat an experience mimicking the cloud:
```
 # NAT network
 ssh -p 3022 vboxuser@127.0.0.1

 # Bridged Adapter network
 ssh vboxuser@192.xxx.x.x
```
Now, perform the following steps / execute commands to install docker and minikube. All the steps are mentioned in this documentation of minikube. Following is the simpler explanation of the mentioned steps.

Docker Installation

A container or virtual machine manager is required. In this, we will be using docker. We will refer to the docker documentation for the docker installation. Following are the steps:

Uninstall the unofficial packages:

 for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done

Install using apt-repository:

 # Add Docker's official GPG key:
 sudo apt-get update
 sudo apt-get install ca-certificates curl
 sudo install -m 0755 -d /etc/apt/keyrings
 sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
 sudo chmod a+r /etc/apt/keyrings/docker.asc

 # Add the repository to Apt sources:
 echo \
   "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
   $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
   sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 sudo apt-get update

Install the docker dependencies:

 sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Verify the installation:
```
 sudo docker run hello-world
```
Use the following command to add the user to docker group. This step is a prerequisite to start minikube:
```
 sudo usermod -aG docker $USER && newgrp docker
```

Minikube Installation

In the previously mentioned minikube documentation, in the ‘Installation’ section, select the buttons to describe the target platform. For our case, it is:

Operating System: Linux
Architecture: x86-64
Release type: Stable
Installer type: Debian package

Perform the following steps to install minikube server.

The reuqired command to install the minikube package will appear below the ‘Installation’ section after selecting the relevant buttons in the sectioned mentioned above which is to be run in the terminal:
```
 curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
 sudo dpkg -i minikube_latest_amd64.deb
```
Start the cluster
```
 minikube start
```
Install kubectl:
```
 sudo snap install kubectl --classic
```

Now we are ready with development ready server of Kubernetes which can be used for hosting, serving and testing small projects.

Speech Refiner

Tuhin Kumar Dutta — Fri, 18 Jul 2025 14:51:44 GMT

In the digital age, the way we communicate in writing matters more than ever. Whether it’s an email, a feedback form or a social media comment. Tone and politeness can make or even break a message. That is why, here is a free to use solution of Speech Refiner, a lightweight AI tool designed to improved and soften any written comminication.

Instead of delving into the mathematics behind the scene, this post only on the practical implementation and deployment to deliver a ready to use tool.

Motivation

We’ve all been there - waking up to promotional emails, automated follow-ups, or vague LinkedIn messages asking for “a quick connect.” While many of these messages border on spam, we often still need to respond, especially in professional settings without sounding rude or dismissive.

In day-to-day communication, whether it's replying to persistent sales emails, addressing unsolicited collaboration requests, or giving constructive feedback to a colleague, tone plays a crucial role. A blunt or emotionally charged message can harm relationships or escalate misunderstandings.

That’s where the idea for Speech Refiner came from:

A tool that helps you express what you mean - firmly if needed but in a polite, professional tone.

In a world driven by remote communication and asynchronous messaging, being clear without being curt is not just good etiquette but it’s a skill. Speech Refiner aims to make that easier, one sentence at a time.

Q: Why not just use ChatGPT?
A: This tool offers a high degree of customization and full control to the user. It follows a plug-and-play architecture, allowing users to seamlessly integrate any local or preferred language models into the backend as per their requirements.

Project Overview

The system is composed of two parts:

Frontend: An n intuitive interface where users can either record their voice or upload a pre-recorded audio clip containing a message they wish to refine. (GitHub documentation)
Backend: A LLM engine powered backend custom API that receives recordings from the frontend and rewrites it to deliver with softened tone, courtesy phrases, and contextual phrasing. (GitHub documentation)

Whether it's a voice note, a spontaneous thought, or a casual remark, Speech Refiner transforms it into a professional, composed version - ready to be used in emails, meetings, or digital conversations. It bridges the gap between natural speech and formal communication, helping users convey their intent with clarity and courtesy.

Application Screenshot

The focus will primarily be on the backend, given the AI-centric nature of our discussion, though we’ll also briefly touch on the frontend toward the end.

Backend

The backbone of the Speech Refiner application is a robust, modular, and production-ready backend service. Designed with a focus on speech-to-text transcription and language refinement, the backend plays a critical role in transforming voice inputs into polished, professional responses.

Let’s take a look at how it works under the hood.

The backend API GitHub repository has been kept private to prevent misuse, such as unauthorized API calls or abuse.

Overview

The Politeness Engine Backend is a RESTful API built using Flask, designed to handle audio files submitted from the frontend. It performs two key tasks:

Transcribes speech from the uploaded audio using a speech recognition engine.
Refines the transcribed text using a Large Language Model (LLM) hosted on Groq.

This backend is lightweight, scalable, and secure, making it ready for integration into real-world applications, whether as a web client, desktop interface, or mobile app.

System Architecture

The core system is organized across three main files:

main.py
Acts as the entry point of the application. It defines the /upload endpoint, handles file uploads, manages temporary file storage, and orchestrates the processing flow.
utils.py
Contains two utility classes:
- Transcription: Converts audio input into text using the Google Speech Recognition API via the speech_recognition package.
- LLM: Sends the transcribed text to a Groq-hosted LLM API with a predefined prompt, returning an enhanced, more polite version of the message.
requirements.txt
Lists all dependencies required to run the service, including Flask, CORS handling, rate limiting, and audio/LLM utilities.

API Endpoint: `/upload`

The backend exposes a single public endpoint:

Method: POST
Route: /upload
Content-Type: multipart/form-data
Form Field: audio - Accepts audio files (WAV)
Request Flow:

1. User uploads an audio file from the frontend interface.

2. File is saved temporarily in the uploads/ directory.

3. Audio is loaded into memory using scipy.io.wavfile.

4. The Transcription module converts the audio into raw text.

5. The text is truncated to 100 words to manage LLM input token limits.

6. The LLM module sends the text to the Groq API for refinement.

7. Refined output is returned alongside the original transcription.

8. The temporary file is deleted to maintain a clean and secure environment.

Successful Response (200 OK)

{
  "input": "Raw transcribed text",
  "output": "Refined and polite version of the message"
}

Error Handling

If no audio file is provided, a 400 Bad Request is returned with a helpful error message.
In case of unexpected issues (e.g., invalid formats, API failures), a 500 Internal Server Error is returned with the error trace for debugging.

Rate Limiting & Security

To prevent abuse and ensure fair usage, the backend implements IP-based rate limiting using Flask-Limiter. Limits are:

10 requests per minute
150 requests per day

Rate limits are stored in Redis, managed through:

Redis Hosting: Upstash
Configuration: Passed via REDIS_URI environment variable
Fallback: Defaults to in-memory storage if Redis is unavailable

LLM Integration (Groq API)

The refinement is powered by a Groq-hosted Large Language Model (llama-3.3-70b-versatile):

Transcribed text is passed to the LLM with a carefully crafted prompt.
The LLM.query_llm() method sends this data via a secure API request.
The API key (GROQ_API_KEY) is stored as an environment variable and never exposed to the client.

This ensures that the entire language processing logic remains server-side and protected.

Deployment Overview

The backend is deployed on Render, a modern cloud platform for hosting APIs.

Server Stack: Flask app served via Gunicorn, a production-ready WSGI HTTP server.
Environment Variables:
- GROQ_API_KEY – LLM API key
- REDIS_URI – Redis connection string (via Upstash)

Uploaded audio files are never persisted long-term—they’re deleted immediately after processing.

Key Dependencies

The application uses the following Python libraries:

Flask, flask-cors – REST API and CORS support
Flask-Limiter, redis – Rate limiting infrastructure
speechrecognition, scipy, soundfile, numpy – Audio handling and transcription
requests – Communicating with the Groq API
gunicorn – Serving the app in production

Frontend

GitHub Repository

The Speech Refiner frontend acts as a clean and intuitive interface for interacting with the backend AI engine. It provides two primary modes of input:

Live voice recording using the browser microphone
Audio file upload (WAVformat)

Once an audio input is submitted, it is sent to the /upload endpoint of the backend. Upon successful processing, users receive the original transcription and the refined, polite version of their message, rendered instantly within the interface.

Tech Stack

HTML + Vanilla JavaScript: Lightweight and dependency-free
Web Audio API & MediaRecorder: For live voice recording
Fetch API: Handles communication with the Flask backend

Packaging for Distribution

To run the app, run:

To build the .exe:

  npx electron .
  npm install
  npx electron-packager . SpeechRefiner --platform=win32 --arch=x64 --icon=favicon.ico --overwrite

This will create a packaged version of the app using Electron Packager or Electron Forge (as configured).

Security Considerations

Since audio data is sensitive, the design ensures that no processing happens on the client side. All voice inputs are securely transmitted to the backend over HTTPS, where transcription and LLM processing take place. The backend API URL is abstracted, and no secrets or tokens are exposed on the frontend.

This separation of concerns ensures a secure and privacy-respecting user experience.

Content Security Policy (CSP) Troubleshooting

Local API vs Hosted API Issues
- Issue: Hosted API (http://192.168.x.x:5000) didn’t respond as expected inside Electron.
- Cause: Hosted API had longer response time (~5 sec) with no visual feedback.
- Fixes:
  - Added a Processing... loader during API call.
  - Verified API response behavior using Postman/browser.
Unsupported Audio Format (WebM instead of WAV)
- Error:
```
  File format b'\x1aE\xdf\xa3' not understood. Only 'RIFF' and 'RIFX' supported.
```
- Cause: MediaRecorder API defaulted to WebM; Flask expected .wav.
- Fix: Switched to recorder.js which generates proper .wav output.
Invalid WAV Header (nAvgBytesPerSec mismatch)
- Error:
```
  WAV header is invalid: nAvgBytesPerSec must equal product of nSamplesPerSec and nBlockAlign
```
- Cause: Some versions of Recorder.js generated incorrect headers.
- Fixes:
  - CDN failed due to MIME issues.
  - Forked versions had broken links.
  - Manually downloaded corrected recorder.js from GitHub and loaded it locally.

MIME Type Execution Errors

Error:

  Refused to execute script from CDN because its MIME type was 'text/plain'

Fix: Used local version of recorder.js:
```
  <script src="recorder.js">script>
```

Electron Warning
- Message:
```
  Insecure Content-Security-Policy: no CSP or unsafe-eval used
```
- Strict CSP Attempt
```
  <meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self'; connect-src http://192.168.x.x:5000;">
```
  - Issue:
    - Inline scripts blocked.
    - Microphone stopped.
    - API hit prematurely without file.
  - Root Cause
    - Electron apps commonly use inline scripts or libraries requiring relaxed policies.
    - Strict CSP blocks eval, inline JavaScript, dynamic execution.
  - Solutions Attempted
    - Tried relaxed CSP with:
```
  script-src 'self' 'unsafe-inline'
```
    - Inline scripts worked, but reintroduced security risks (e.g., XSS).
  - Final Decision
    - CSP not applied now due to dev-time constraints.
    - Plan:
      - Keep .exe private.
      - Share code with API placeholder.
      - Let users build locally and request API key if needed.

recorder.js is downloaded from here.

Recommendations for Future Deployment

Extract all inline scripts into external files.
Set a strict and secure CSP header.
Remove unsafe-inline and unsafe-eval.
Validate microphone permissions and backend headers for production use.

TechTrail

Turn Your Portfolio Into an AI Chatbot

What You’ll Get

Why I Built This

Prerequisites

🛠 Step 1: Duplicate the Space

🔐 Step 2: Add Space Secrets

1️⃣ GROQ_API_KEY

2️⃣ KNOWLEDGE_CONTEXT_URL

3️⃣ INSTRUCTIONS_URL

⚙️ Step 3: Add Space Variables

CURRENT_STATUS_MESSAGE

LAST_KNOWLEDGE_UPDATED_DATE

CREATOR_NAME

WEBSITE

🔄 Updating and Rebuilding

🧪 Step 4: Test It

📌 Important Behavior Notes

1️⃣ Groq API Rate Limits

2️⃣ Hugging Face Free Tier Sleep Policy

3️⃣ No Conversation Memory

4️⃣ Why Use Hugging Face Instead of Hosting Yourself?

5️⃣ You Can Use Your Own Domain

🎯 Final Thoughts

Concurrency in Python

What is concurreny?

Why is it required?

I/O bound task

CPU bound task

Important Terminologies

Thread

Multiprocess

CPython

GIL

*Other definitions

Models for concurrency

Multithreading

Demo

Multiprocessing

Demo

Asyncio

Demo

Comparison table for the concurrency models

Quick rule of thumb choose concurrency model for my task

Zenalyze: My AI-Assisted Data Analysis Tool (And Why I Built It)

🧩 The Problem I Wanted to Solve

🎯 The Motivation Behind Zenalyze

1. LLMs should help you code, not replace your environment

2. Your data never leaves your machine

3. History-aware analysis

4. Make the experience fun

🔐 Security: Close to Your Data, Never Inside It

But use it responsibly.

🤝 What Zenalyze Actually Does

🧘 Why It's Called Zenalyze

🛠️ Setup & Environment Notes

📦 Installation

🚀 Demo Time

1️⃣ Create a Virtual Environment

2️⃣ Install Zenalyze

3️⃣ Add Your Environment Variables

4️⃣Prepare Your Data Folder (and Optional Description File)

⭐ Optional but Highly Recommended: Add a desc.json File

🤝 Don’t Want to Write It Manually?

5️⃣ Initialize Zenalyze

Demo Notebook

🎁 Final Thoughts

Machine Learning Experimentation with Airflow

Problem Statement

Motivation

Implementation Overview

1. The Jenkins CI/CD Pipeline — The Automation Backbone

2. Docker Image Design — Parameterized and Secure

3. Dependency Management via Nexus

4. Execution Lifecycle Summary

5. Design Priorities

How I Used the Framework

1. Building and Versioning the Project Package

2. Integrating with the Airflow Deployment Framework

3. Running Training and Deployment Pipelines

⭐ Optional but Highly Recommended: Add a `desc.json` File

API Endpoint: `/upload`

`recorder.js` is downloaded from here.