<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[TechTrail]]></title><description><![CDATA[TechTrail]]></description><link>https://techtrail.tuhindutta.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1753638509901/b4cd13d7-1900-403d-8229-375539a4170c.png</url><title>TechTrail</title><link>https://techtrail.tuhindutta.com</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 00:22:53 GMT</lastBuildDate><atom:link href="https://techtrail.tuhindutta.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Turn Your Portfolio Into an AI Chatbot]]></title><description><![CDATA[People build resumes. Some build portfolio websites. A few create Notion pages.Most of it gets skimmed and forgotten.
A conversational portfolio is different.It talks. It adapts. It answers exactly what someone cares about.And you don’t need to manag...]]></description><link>https://techtrail.tuhindutta.com/turn-your-portfolio-into-an-ai-chatbot</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/turn-your-portfolio-into-an-ai-chatbot</guid><category><![CDATA[huggingface]]></category><category><![CDATA[groq]]></category><category><![CDATA[AI Chatbot]]></category><category><![CDATA[portfolio]]></category><category><![CDATA[llm]]></category><category><![CDATA[conversational-ai]]></category><category><![CDATA[personal branding]]></category><category><![CDATA[huggingface-spaces]]></category><category><![CDATA[api integration]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Sun, 04 Jan 2026 06:03:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767541452692/f42af618-a975-4df5-bc65-68e68f38d205.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>People build resumes. Some build portfolio websites. A few create Notion pages.<br />Most of it gets skimmed and forgotten.</p>
<p>A conversational portfolio is different.<br />It talks. It adapts. It answers exactly what someone cares about.<br />And you don’t need to manage servers, rent GPUs, or write complex infra scripts to do it.</p>
<p>In this guide, I’ll show you exactly how to duplicate my <strong>Virtual Portfolio Chatbot</strong> hosted on Hugging Face Spaces and turn it into your own digital twin.</p>
<hr />
<h2 id="heading-what-youll-get">What You’ll Get</h2>
<p>By the end, you will have:</p>
<ul>
<li><p>A ready-to-use AI chatbot hosted on Hugging Face</p>
</li>
<li><p>Backed by Groq API</p>
</li>
<li><p>Customized with your achievements, knowledge, tone, personality</p>
</li>
<li><p>Zero infrastructure headaches</p>
</li>
<li><p>No backend setup required</p>
</li>
</ul>
<p>My live example:<br /><a target="_blank" href="https://huggingface.co/spaces/tkdutta/virtual_tkd">https://huggingface.co/spaces/tkdutta/virtual_tkd</a></p>
<p>Repo (if you want to explore or understand structure):<br /><a target="_blank" href="https://huggingface.co/spaces/tkdutta/virtual_tkd/tree/main">https://huggingface.co/spaces/tkdutta/virtual_tkd/tree/main</a></p>
<hr />
<h2 id="heading-why-i-built-this">Why I Built This</h2>
<p>Three simple reasons:</p>
<ol>
<li><p>Static portfolios feel lifeless</p>
</li>
<li><p>I wanted something interactive, memorable and dynamic</p>
</li>
<li><p>Integrating it into my <a target="_blank" href="https://www.tuhindutta.com/">portfolio app</a>.</p>
</li>
</ol>
<p>This chatbot:</p>
<ul>
<li><p>represents me</p>
</li>
<li><p>answers questions about me</p>
</li>
<li><p>adapts as I grow</p>
</li>
<li><p>and honestly, feels more “alive” than a PDF</p>
</li>
</ul>
<hr />
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>You only need:</p>
<ul>
<li><p>Hugging Face account</p>
</li>
<li><p>Groq API key (Get your oown from <a target="_blank" href="https://console.groq.com/keys">https://console.groq.com/keys</a>)</p>
</li>
<li><p>Two Google Drive <code>.txt</code> files</p>
</li>
</ul>
<p>That’s it.</p>
<hr />
<h1 id="heading-step-1-duplicate-the-space">🛠 Step 1: Duplicate the Space</h1>
<p>Go to:<br /><a target="_blank" href="https://huggingface.co/spaces/tkdutta/virtual_tkd">https://huggingface.co/spaces/tkdutta/virtual_tkd</a></p>
<p>Click <strong>Duplicate Space</strong>.</p>
<p>During duplication, Hugging Face will already give you fields to:</p>
<ul>
<li><p>Add <strong>Secrets</strong></p>
</li>
<li><p>Add <strong>Variables</strong></p>
</li>
</ul>
<p>You can set them right there itself or do it later. Either is fine.</p>
<p>Keep:</p>
<ul>
<li><p>Default CPU hardware (good enough)</p>
</li>
<li><p>Public or Private as per preference</p>
</li>
</ul>
<p>Done. You now have your own copy.</p>
<hr />
<h1 id="heading-step-2-add-space-secrets">🔐 Step 2: Add Space Secrets</h1>
<p>Go to:<br />Settings → Secrets</p>
<p>Add these.</p>
<hr />
<h3 id="heading-1-groqapikey">1️⃣ GROQ_API_KEY</h3>
<p>Your Groq API key.<br />Self explanatory.</p>
<hr />
<h3 id="heading-2-knowledgecontexturl">2️⃣ KNOWLEDGE_CONTEXT_URL</h3>
<p>A public Google Drive link to a <strong>.txt file</strong> containing your knowledge.</p>
<p>What should be inside?<br />You decide:</p>
<ul>
<li><p>Work history</p>
</li>
<li><p>Projects</p>
</li>
<li><p>Achievements</p>
</li>
<li><p>Experience stories</p>
</li>
<li><p>Tech stack</p>
</li>
<li><p>Anything you want your chatbot to “know”</p>
</li>
</ul>
<p>This basically acts as your knowledge base.</p>
<p>Make sure:</p>
<ul>
<li><p>File is <code>.txt</code></p>
</li>
<li><p>Publicly shareable</p>
</li>
<li><p>Viewable by anyone</p>
</li>
</ul>
<hr />
<h3 id="heading-3-instructionsurl">3️⃣ INSTRUCTIONS_URL</h3>
<p>Another Google Drive <code>.txt</code> file.</p>
<p>This shapes the chatbot’s:</p>
<ul>
<li><p>tone</p>
</li>
<li><p>behavior</p>
</li>
<li><p>personality</p>
</li>
<li><p>conversational style</p>
</li>
<li><p>boundaries</p>
</li>
<li><p>how it introduces you</p>
</li>
<li><p>how it replies to recruiters or visitors</p>
</li>
</ul>
<p>Think of this as your “system prompt.”<br />It defines your chatbot’s personality.</p>
<hr />
<h1 id="heading-step-3-add-space-variables">⚙️ Step 3: Add Space Variables</h1>
<p>Go to:<br />Settings → Variables</p>
<p>These are simple configurable values.</p>
<hr />
<h3 id="heading-currentstatusmessage">CURRENT_STATUS_MESSAGE</h3>
<p>Your current status or what you’re focusing on.</p>
<p>Example:</p>
<pre><code class="lang-plaintext">Learning Terraform and preparing for certification.
</code></pre>
<p>This keeps changing over time, so keeping it separate avoids touching code repeatedly.</p>
<hr />
<h3 id="heading-lastknowledgeupdateddate">LAST_KNOWLEDGE_UPDATED_DATE</h3>
<p>To show transparency about freshness of your knowledge base.</p>
<p>Example:</p>
<pre><code class="lang-plaintext">Dec 2025
</code></pre>
<p>Displayed in the chatbot UI footer so users know how updated the information is.</p>
<hr />
<h3 id="heading-creatorname">CREATOR_NAME</h3>
<p>Name of the creator or portfolio owner.<br />This is displayed in the chatbot UI footer.</p>
<p>Example:</p>
<pre><code class="lang-plaintext">Tuhin Kumar Dutta
</code></pre>
<hr />
<h3 id="heading-website">WEBSITE</h3>
<p>Optional field.<br />If provided, this URL will be linked with the creator name in the footer.</p>
<p>Example:</p>
<pre><code class="lang-plaintext">https://www.tuhindutta.com/
</code></pre>
<p>If you don’t want to link anything, simply leave it empty.</p>
<hr />
<h1 id="heading-updating-and-rebuilding">🔄 Updating and Rebuilding</h1>
<p>Whenever:</p>
<ul>
<li><p>your Google Drive knowledge file changes</p>
</li>
<li><p>or your instruction file changes</p>
</li>
</ul>
<p>Go to Hugging Face and simply <strong>Rebuild the Space</strong>.</p>
<p>That reloads fresh content.</p>
<hr />
<h1 id="heading-step-4-test-it">🧪 Step 4: Test It</h1>
<p>Open your chatbot.</p>
<p>Ask:</p>
<ul>
<li><p>“Tell me about your experience”</p>
</li>
<li><p>“What projects have you built?”</p>
</li>
<li><p>“What are you currently working on?”</p>
</li>
<li><p>“Explain your expertise”</p>
</li>
</ul>
<p>If it sounds like you, great.<br />If not, tune your <code>.txt</code> files and rebuild.</p>
<hr />
<h1 id="heading-important-behavior-notes">📌 Important Behavior Notes</h1>
<h3 id="heading-1-groq-api-rate-limits">1️⃣ Groq API Rate Limits</h3>
<p>Your chatbot follows:</p>
<ul>
<li><p>Groq API rate limits</p>
</li>
<li><p>Model usage caps</p>
</li>
</ul>
<p>If you want higher usage or smoother performance, upgrade your Groq plan.</p>
<hr />
<h3 id="heading-2-hugging-face-free-tier-sleep-policy">2️⃣ Hugging Face Free Tier Sleep Policy</h3>
<p>If you keep HF Space on free CPU tier:</p>
<ul>
<li><p>It <strong>goes to sleep after 48 hours of inactivity</strong></p>
</li>
<li><p>To keep it alive, open it at least once every 48 hours</p>
</li>
<li><p>Or take HF paid subscription to keep it always active</p>
</li>
</ul>
<p>So decide based on how actively you want it running.</p>
<hr />
<h3 id="heading-3-no-conversation-memory">3️⃣ No Conversation Memory</h3>
<p>The chatbot does not track past chats.</p>
<p>Each message is handled as a fresh independent query.</p>
<p>Honestly, for a portfolio bot, this is good enough and keeps things simple.</p>
<hr />
<h3 id="heading-4-why-use-hugging-face-instead-of-hosting-yourself">4️⃣ Why Use Hugging Face Instead of Hosting Yourself?</h3>
<p>Because:</p>
<ul>
<li><p>Zero infra maintenance</p>
</li>
<li><p>Built-in logs</p>
</li>
<li><p>Built-in hosting</p>
</li>
<li><p>No deployments hassles</p>
</li>
<li><p>Easy duplication</p>
</li>
<li><p>Simple updates</p>
</li>
</ul>
<p>This entire HF Space approach exists to make life easier.</p>
<p>If someone wants, they can still:</p>
<ul>
<li><p>use the repo code</p>
</li>
<li><p>build their own app</p>
</li>
<li><p>deploy it wherever they want</p>
</li>
</ul>
<p>But most people simply want it to work without pain. HF solves that.</p>
<hr />
<h3 id="heading-5-you-can-use-your-own-domain">5️⃣ You Can Use Your Own Domain</h3>
<p>Yes, you can point your custom domain and redirect it to your HF space.</p>
<p>So your chatbot can live behind a cleaner URL.</p>
<hr />
<h1 id="heading-final-thoughts">🎯 Final Thoughts</h1>
<p>This chatbot is meant to be:</p>
<ul>
<li><p>simple</p>
</li>
<li><p>personal</p>
</li>
<li><p>useful</p>
</li>
<li><p>easy to maintain</p>
</li>
</ul>
<p>No over-engineering.<br />No unnecessary complexity.</p>
<p>Just a neat digital twin that represents you well.</p>
<p>If you build yours, I’d genuinely love to see it.</p>
]]></content:encoded></item><item><title><![CDATA[Concurrency in Python]]></title><description><![CDATA[Python’s concurrency tools let you optimize programs by running tasks in overlapping periods—great for cutting down wait times in I/O-heavy work and taking advantage of multiple CPU cores for compute-intensive tasks. In this article, I’m focusing on ...]]></description><link>https://techtrail.tuhindutta.com/concurrency-in-python</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/concurrency-in-python</guid><category><![CDATA[Python]]></category><category><![CDATA[concurrency]]></category><category><![CDATA[optimization]]></category><category><![CDATA[Threading]]></category><category><![CDATA[multithreading]]></category><category><![CDATA[multiprocessing]]></category><category><![CDATA[asyncio]]></category><category><![CDATA[asynchronous programming]]></category><category><![CDATA[parallelism]]></category><category><![CDATA[GIL]]></category><category><![CDATA[producer consumer]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Fri, 26 Dec 2025 17:46:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/cG9xY_1NBcA/upload/a6ab8f3cbcc64ddeb4a232a68c47242f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Python’s concurrency tools let you optimize programs by running tasks in overlapping periods—great for cutting down wait times in I/O-heavy work and taking advantage of multiple CPU cores for compute-intensive tasks. In this article, I’m focusing on the most practical and widely used approaches: <strong>multithreading</strong> for I/O-bound workloads, <strong>multiprocessing</strong> for CPU-bound tasks, and <strong>asyncio</strong> for high-scale, single-threaded efficiency.</p>
<p>Concurrency in Python is a deep topic, and no single article can cover every nuance. There are advanced patterns, tricky edge cases, and evolving best practices you’ll discover over time. Treat this as a solid starting point. Then experiment, break things, debug them, and refine your understanding—hands-on work is what really makes these concepts click.</p>
<p>And please, read this sequentially from start to finish… not concurrently. Otherwise, you may end up with a race condition in your understanding. 😄</p>
<p>Let’s dive in!</p>
<h1 id="heading-what-is-concurreny">What is concurreny?</h1>
<p>When tasks defined in the code runs in a symultaneous or sometimes seemingly almost (pseudo) symultaneous way that results in efficient usage of time and compute resources is defined as tasks running concurrently.</p>
<hr />
<h1 id="heading-why-is-it-required">Why is it required?</h1>
<ol>
<li><h2 id="heading-io-bound-task">I/O bound task</h2>
<p> Simply put, by default Python code runs sequentially. Whenever a blocking task is encountered, the execution waits until the current task completes before moving forward. In some scenarios, this wait period is long enough that system resources remain largely idle. We can take advantage of this window to execute other tasks and return once the original task is ready to continue.</p>
</li>
<li><h2 id="heading-cpu-bound-task">CPU bound task</h2>
<p> There are cases where multiple tasks require heavy CPU processing. With sequential execution, each task must finish before the next one starts. Using multiprocessing, these tasks can be distributed across different CPU cores and executed in parallel, reducing overall execution time.</p>
</li>
</ol>
<hr />
<h1 id="heading-important-terminologies">Important Terminologies</h1>
<h2 id="heading-thread"><strong>Thread</strong></h2>
<p>It is a unit of execution within a process that runs a piece of code. By default, since we have a single thread, all the tasks are handled by the same thread sequentially.</p>
<p>For e.g., we have a 3 tasks:</p>
<ul>
<li><p>Load <code>customers</code> dataset.</p>
</li>
<li><p>Load <code>accounts</code> dataset.</p>
</li>
<li><p>Load <code>orders</code> dataset.</p>
</li>
</ul>
<p>Multi-threading allows us to create multiple threads to perform the tasks concurrently.</p>
<p><strong><em>Note: This is not parallel computing or multiprocess. Exact difference will be discussed later in Threading section.</em></strong></p>
<h2 id="heading-multiprocess"><strong>Multiprocess</strong></h2>
<p>Multiprocessing is the ability to execute tasks using multiple processes. Each process has its own memory space and can run on a separate CPU core, enabling true parallel execution.</p>
<h2 id="heading-cpython">CPython</h2>
<ol>
<li><p>a <strong>bytecode interpreter</strong></p>
</li>
<li><p>written in <strong>C</strong></p>
</li>
<li><p>managing <strong>millions of tiny objects</strong></p>
</li>
<li><p>optimized for <strong>single-thread performance</strong></p>
</li>
<li><p>The core technical problem CPython faces:</p>
<ol>
<li><p>Every Python object has reference counts, mutable internal state, shared memory.</p>
</li>
<li><p>So CPython must ensure:</p>
<ul>
<li><p>reference counts stay correct</p>
</li>
<li><p>objects aren’t freed while still in use</p>
</li>
<li><p>memory isn’t corrupted</p>
</li>
</ul>
</li>
</ol>
</li>
</ol>
<h2 id="heading-gil"><strong>GIL</strong></h2>
<p>The <strong>Global Interpreter Lock</strong> (GIL) is a mechanism in CPython that allows only one thread to execute Python bytecode at a time, which limits true parallelism in multithreaded, CPU-bound programs.</p>
<p>The GIL ensures:</p>
<ul>
<li><p>only <strong>one thread</strong> executes Python bytecode at a time</p>
</li>
<li><p>threads switch at well-defined points</p>
</li>
<li><p>memory state stays consistent</p>
</li>
</ul>
<p>It does <strong>not</strong>:</p>
<ul>
<li><p>prevent I/O parallelism</p>
</li>
<li><p>block native code from running</p>
</li>
<li><p>affect multiprocessing</p>
</li>
</ul>
<p><strong><em>Why is GIL required?</em></strong></p>
<ol>
<li><p>Without a GIL, <strong>every single object operation</strong> would need locks.</p>
</li>
<li><p>CPython designers chose one global lock instead of many tiny blocks.</p>
</li>
<li><p>This dramatically simplifies:</p>
<ul>
<li><p>memory management</p>
</li>
<li><p>garbage collection</p>
</li>
<li><p>C-extension APIs</p>
</li>
<li><p>interpreter correctness</p>
</li>
</ul>
</li>
<li><p>Removing the GIL means building a complete new Python interpreter:</p>
<ul>
<li><p>rewriting memory management</p>
</li>
<li><p>redesigning object model</p>
</li>
<li><p>breaking C extensions</p>
</li>
<li><p>slowing down single-thread code</p>
</li>
<li><p>introducing subtle race bugs</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-other-definitions">*Other definitions</h2>
<p>Refer to the following definitions when you encounter the terms while reading for better context:</p>
<ul>
<li><p><strong>Race Condition</strong> – A race condition happens when two or more threads or processes try to access and modify the same shared resource at the same time, and the final outcome depends on the order in which those operations happen. Since this order is unpredictable, the result becomes inconsistent, incorrect, and difficult to debug.</p>
<p>  Example<br />  Imagine you have a shared variable <code>balance = 100</code>.<br />  Two threads are trying to withdraw <code>50</code> at the same time.</p>
<p>  Both read the value as <code>100</code>, both subtract <code>50</code>, and both write back <code>50</code>.<br />  Logically, the balance should be <code>0</code>, but you end up with <code>50</code>. That is a race condition.</p>
<p>  Why it happens</p>
<ul>
<li><p>Shared resource</p>
</li>
<li><p>Multiple threads accessing it</p>
</li>
<li><p>No proper coordination or locking</p>
</li>
</ul>
</li>
</ul>
<p>    Result</p>
<ul>
<li><p>Random output</p>
</li>
<li><p>Rare bugs</p>
</li>
<li><p>Inconsistent behavior</p>
</li>
</ul>
<ul>
<li><p><strong>Event Loop</strong> – The event loop is the core of <code>asyncio</code>. It continuously runs in a single thread, schedules tasks, and switches between coroutines whenever they pause on an awaited operation. Instead of waiting idly, the event loop keeps other tasks moving, which enables concurrency without using multiple threads.</p>
</li>
<li><p><strong>Coroutines</strong> – Coroutines are special functions defined with <code>async def</code> that support asynchronous execution. Instead of blocking, they pause using <code>await</code> while waiting for I/O or other asynchronous work, allowing the event loop to run other coroutines in the meantime.</p>
</li>
<li><p><strong>Non-Blocking Operation</strong> – A non-blocking operation is an operation that does not stop execution while waiting for a result. Instead of freezing the program, it immediately returns control and resumes later when the result is ready. In asyncio, most I/O operations (like network calls, file operations, timers, etc.) are non-blocking, allowing other tasks to run during the wait time.</p>
</li>
</ul>
<hr />
<h1 id="heading-models-for-concurrency">Models for concurrency</h1>
<h2 id="heading-multithreading">Multithreading</h2>
<ul>
<li><p>Used primarily for I/O-bound tasks, where threads spend most of their time waiting on OS I/O, not executing Python bytecode.</p>
</li>
<li><p>In such scenarios, multiple threads are used.</p>
</li>
<li><p>Each thread is responsible of a particular task. Each thread executes a task, but tasks may share data and state.</p>
</li>
<li><p>The thread releases the GIL only when it enters a blocking I/O operation implemented in C that explicitly releases the GIL.</p>
</li>
<li><p>Another runnable thread may acquire the GIL and continue executing Python bytecode.</p>
</li>
<li><p>Thus, while some threads are blocked on I/O, other threads can make progress, reducing idle CPU time.</p>
</li>
<li><p>All threads share:</p>
<ul>
<li><p>variables</p>
</li>
<li><p>memory</p>
</li>
<li><p>interpreter</p>
</li>
<li><p>GIL</p>
</li>
</ul>
</li>
<li><p>This sharing is why data corruption through *<strong>race conditions</strong> are possible.</p>
</li>
<li><p>Threads <em>can</em> manipulate shared global variables, which requires explicit synchronization to avoid race conditions.</p>
</li>
</ul>
<h3 id="heading-demo">Demo</h3>
<p>Using <a target="_blank" href="https://docs.anilist.co/guide/graphql/">Anilist API</a> to demonstrate the time consumption comparison between syncronous and asynchronous approach to get output from 15 requests.</p>
<p><strong>AnimeAPI:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AnimeAPI</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.query = <span class="hljs-string">'''
query ($id: Int) { # Define which variables will be used in the query (id)
  Media (id: $id, type: ANIME) { # Insert our variables into the query arguments (id) (type: ANIME is hard-coded in the query)
    id
    title {
      romaji
      english
      native
    }
  }
}
'''</span>
        self.url = <span class="hljs-string">'https://graphql.anilist.co'</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">response</span>(<span class="hljs-params">self, anime_id:int</span>):</span>
        variables = {
            <span class="hljs-string">'id'</span>: anime_id
        }
        response = requests.post(self.url, json={<span class="hljs-string">'query'</span>: self.query, <span class="hljs-string">'variables'</span>: variables})
        <span class="hljs-keyword">return</span> response.json()
</code></pre>
<p><strong>Multithreading execution:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> api <span class="hljs-keyword">import</span> AnimeAPI
<span class="hljs-keyword">from</span> time <span class="hljs-keyword">import</span> perf_counter
<span class="hljs-keyword">from</span> concurrent.futures <span class="hljs-keyword">import</span> ThreadPoolExecutor, as_completed

anime = AnimeAPI()

anime_ids = [<span class="hljs-number">100</span>,<span class="hljs-number">200</span>,<span class="hljs-number">300</span>,<span class="hljs-number">400</span>, <span class="hljs-number">700</span>, <span class="hljs-number">628</span>, <span class="hljs-number">524</span>, <span class="hljs-number">377</span>, <span class="hljs-number">826</span>, <span class="hljs-number">451</span>, <span class="hljs-number">280</span>, <span class="hljs-number">395</span>, <span class="hljs-number">399</span>, <span class="hljs-number">124</span>, <span class="hljs-number">626</span>]


<span class="hljs-comment"># Sync code #############################################################################</span>

anime_dic = {}

sync_start = perf_counter()

<span class="hljs-keyword">for</span> anime_id <span class="hljs-keyword">in</span> anime_ids:
    <span class="hljs-keyword">try</span>:
        anime_dic[anime_id] = anime.response(anime_id)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        anime_dic[anime_id] = e

sync_end = perf_counter()

<span class="hljs-comment">#####################################################################################</span>

<span class="hljs-comment"># Async code ############################################################################</span>

anime_dic_async = {}

async_start = perf_counter()

<span class="hljs-keyword">with</span> ThreadPoolExecutor(max_workers=<span class="hljs-number">5</span>) <span class="hljs-keyword">as</span> executor:
    futures = {executor.submit(anime.response, i): i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> anime_ids}

    <span class="hljs-keyword">for</span> future <span class="hljs-keyword">in</span> as_completed(futures):

        anime = futures[future]

        <span class="hljs-keyword">try</span>:
            data = future.result()
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            data = e

        anime_dic_async[anime] = data

async_end = perf_counter()

<span class="hljs-comment">#####################################################################################</span>

sync_time = sync_end - sync_start
async_time = async_end - async_start

print(<span class="hljs-string">f'''
Number of animes requested synchronously: <span class="hljs-subst">{len(anime_ids)}</span>
Number of outputs received synchronously: <span class="hljs-subst">{len(anime_dic)}</span>
Time taken for synchronous execution: <span class="hljs-subst">{sync_time}</span>

##############################################################

Number of animes requested asynchronously: <span class="hljs-subst">{len(anime_ids)}</span>
Number of outputs received asynchronously: <span class="hljs-subst">{len(anime_dic_async)}</span>
Time taken for asynchronous execution: <span class="hljs-subst">{async_time}</span>
'''</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Number of animes requested synchronously: 15
Number of outputs received synchronously: 15
Time taken <span class="hljs-keyword">for</span> synchronous execution: 10.28090550005436

<span class="hljs-comment">##############################################################</span>

Number of animes requested asynchronously: 15
Number of outputs received asynchronously: 15
Time taken <span class="hljs-keyword">for</span> asynchronous execution: 3.0076786999125034
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Note the significant reduction in the execution time in concurrency.</div>
</div>

<p><strong>Race condition demo:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> time <span class="hljs-keyword">import</span> sleep
<span class="hljs-keyword">from</span> threading <span class="hljs-keyword">import</span> Thread

balance = <span class="hljs-number">1000</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">withdraw</span>(<span class="hljs-params">amount: float</span>):</span>
    <span class="hljs-keyword">global</span> balance
    temp = balance
    sleep(<span class="hljs-number">0.001</span>)
    balance = temp - amount

threads = [Thread(target=withdraw, args=[<span class="hljs-number">50</span>]) <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">4</span>)]

<span class="hljs-keyword">for</span> thread <span class="hljs-keyword">in</span> threads:
    thread.start()

<span class="hljs-keyword">for</span> thread <span class="hljs-keyword">in</span> threads:
    thread.join()

print(<span class="hljs-string">f"Expected: 800, Actual: <span class="hljs-subst">{balance}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Expected: 800, Actual: 950
</code></pre>
<p><strong>Solving race condition using Lock:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> time <span class="hljs-keyword">import</span> sleep
<span class="hljs-keyword">from</span> threading <span class="hljs-keyword">import</span> Thread, Lock, current_thread

balance = <span class="hljs-number">1000</span>

lock = Lock()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">withdraw</span>(<span class="hljs-params">amount: float</span>):</span>
    <span class="hljs-keyword">global</span> balance
    <span class="hljs-keyword">with</span> lock:
        temp = balance
        print(<span class="hljs-string">f"Thread <span class="hljs-subst">{current_thread()}</span>: Reading balance = <span class="hljs-subst">{temp}</span>"</span>)
        sleep(<span class="hljs-number">0.001</span>)
        balance = temp - amount
        print(<span class="hljs-string">f"Thread <span class="hljs-subst">{current_thread}</span>: New balance = <span class="hljs-subst">{balance}</span>"</span>)

threads = [Thread(target=withdraw, args=[<span class="hljs-number">50</span>]) <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">4</span>)]

<span class="hljs-keyword">for</span> thread <span class="hljs-keyword">in</span> threads:
    thread.start()

<span class="hljs-keyword">for</span> thread <span class="hljs-keyword">in</span> threads:
    thread.join()

print(<span class="hljs-string">f"Expected: 800, Actual: <span class="hljs-subst">{balance}</span>"</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Thread &lt;Thread(Thread-1 (withdraw), started 27604)&gt;: Reading balance = 1000
Thread &lt;<span class="hljs-keyword">function</span> current_thread at 0x000001758BD21BC0&gt;: New balance = 950
Thread &lt;Thread(Thread-2 (withdraw), started 29580)&gt;: Reading balance = 950
Thread &lt;<span class="hljs-keyword">function</span> current_thread at 0x000001758BD21BC0&gt;: New balance = 900
Thread &lt;Thread(Thread-3 (withdraw), started 29852)&gt;: Reading balance = 900
Thread &lt;<span class="hljs-keyword">function</span> current_thread at 0x000001758BD21BC0&gt;: New balance = 850
Thread &lt;Thread(Thread-4 (withdraw), started 7656)&gt;: Reading balance = 850
Thread &lt;<span class="hljs-keyword">function</span> current_thread at 0x000001758BD21BC0&gt;: New balance = 800
Expected: 800, Actual: 800
</code></pre>
<h2 id="heading-multiprocessing">Multiprocessing</h2>
<ul>
<li><p>Each process has its own threads and its own GIL, but processes do not share a GIL with each other.</p>
</li>
<li><p>When there are multiple CPU-bound tasks, they are distributed among different cores of the CPU, each handling the task in an isolated environment. Thus, multiple processes run on multiple cores.</p>
</li>
<li><p>The OS scheduler maps processes to cores, enabling true parallel execution of CPU-bound work.</p>
</li>
<li><p>Each process has their own set/copy of:</p>
<ul>
<li><p>variables</p>
</li>
<li><p>memory</p>
</li>
<li><p>interpreter</p>
</li>
<li><p>GIL</p>
</li>
</ul>
</li>
<li><p>Memory is not shared by default, and therefore is shared using serialization (pickling) costs.</p>
</li>
<li><p>Processes cannot directly manipulate the same global variable because they do not share memory space. Shared memory can be explicitly created, but it is not the default and must be managed carefully.</p>
</li>
<li><p>For orchestrating processes and building pipeline for data transfer, <code>Queue</code> is used. It is used to build multiprocessing pipelines. It lets <code>producers</code> push data and <code>consumers</code> retrieve it safely, without conflicts.</p>
</li>
</ul>
<h3 id="heading-demo-1"><strong>Demo</strong></h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> time <span class="hljs-keyword">import</span> perf_counter
<span class="hljs-keyword">from</span> concurrent.futures <span class="hljs-keyword">import</span> ProcessPoolExecutor, as_completed


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calculate_factorial</span>(<span class="hljs-params">start: int, end: int</span>):</span>
    result = <span class="hljs-number">1</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(start, end + <span class="hljs-number">1</span>):
        result *= i
    <span class="hljs-comment"># Adding extra computation to make it CPU-heavy</span>
    <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">2000000</span>):
        result = (result % <span class="hljs-number">1000000007</span>) * <span class="hljs-number">2</span>
    <span class="hljs-keyword">return</span> result


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    processes = [(<span class="hljs-number">2</span>,<span class="hljs-number">38</span>), (<span class="hljs-number">5</span>, <span class="hljs-number">50</span>), (<span class="hljs-number">68</span>, <span class="hljs-number">83</span>), (<span class="hljs-number">14</span>, <span class="hljs-number">57</span>),
                 (<span class="hljs-number">3</span>,<span class="hljs-number">38</span>), (<span class="hljs-number">6</span>, <span class="hljs-number">50</span>), (<span class="hljs-number">69</span>, <span class="hljs-number">83</span>), (<span class="hljs-number">15</span>, <span class="hljs-number">57</span>),
                 (<span class="hljs-number">4</span>,<span class="hljs-number">38</span>), (<span class="hljs-number">7</span>, <span class="hljs-number">50</span>), (<span class="hljs-number">70</span>, <span class="hljs-number">83</span>), (<span class="hljs-number">16</span>, <span class="hljs-number">57</span>),
                 (<span class="hljs-number">5</span>,<span class="hljs-number">38</span>), (<span class="hljs-number">8</span>, <span class="hljs-number">50</span>), (<span class="hljs-number">25</span>, <span class="hljs-number">83</span>), (<span class="hljs-number">17</span>, <span class="hljs-number">57</span>),
                 (<span class="hljs-number">6</span>,<span class="hljs-number">38</span>), (<span class="hljs-number">9</span>, <span class="hljs-number">50</span>), (<span class="hljs-number">26</span>, <span class="hljs-number">83</span>), (<span class="hljs-number">18</span>, <span class="hljs-number">57</span>)]
    cpu_count = os.cpu_count()
    cpu2use = int(<span class="hljs-number">0.5</span> * cpu_count)

    <span class="hljs-comment"># Sync code #############################################################################</span>

    results = {}

    sync_start = perf_counter()

    <span class="hljs-keyword">for</span> process <span class="hljs-keyword">in</span> processes:
        <span class="hljs-keyword">try</span>:
            results[process] = calculate_factorial(*process)
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            results[process] = e

    sync_end = perf_counter()

    <span class="hljs-comment">#####################################################################################</span>

    <span class="hljs-comment"># Async code ############################################################################</span>

    results_async = {}

    async_start = perf_counter()

    <span class="hljs-keyword">with</span> ProcessPoolExecutor(max_workers = cpu2use) <span class="hljs-keyword">as</span> executor:
        futures = {executor.submit(calculate_factorial, *i): i <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> processes}

        <span class="hljs-keyword">for</span> future <span class="hljs-keyword">in</span> as_completed(futures):

            process_id = futures[future]

            <span class="hljs-keyword">try</span>:
                data = future.result()
            <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
                data = e

            results_async[process_id] = data

    async_end = perf_counter()

    <span class="hljs-comment">#####################################################################################</span>

    sync_time = sync_end - sync_start
    async_time = async_end - async_start

    print(<span class="hljs-string">f'''
Number of processes processed synchronously: <span class="hljs-subst">{len(processes)}</span>
Number of outputs received synchronously: <span class="hljs-subst">{len(results)}</span>
Time taken for synchronous execution: <span class="hljs-subst">{sync_time}</span>

##############################################################

Number of processes processed asynchronously: <span class="hljs-subst">{len(processes)}</span>
Number of outputs received asynchronously: <span class="hljs-subst">{len(results_async)}</span>
Time taken for asynchronous execution: <span class="hljs-subst">{async_time}</span>
'''</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Number of processes processed synchronously: 20
Number of outputs received synchronously: 20
Time taken <span class="hljs-keyword">for</span> synchronous execution: 3.7746968001592904

<span class="hljs-comment">##############################################################</span>

Number of processes processed asynchronously: 20
Number of outputs received asynchronously: 20
Time taken <span class="hljs-keyword">for</span> asynchronous execution: 1.1230935999192297
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Note the significant reduction in the execution time in concurrency.</div>
</div>

<p><strong>Queue demo</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> multiprocessing <span class="hljs-keyword">import</span> Process, Queue


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save_data_to_json</span>(<span class="hljs-params">data:list</span>):</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">with</span> open(<span class="hljs-string">'data.json'</span>, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> file:
            loaded_data = json.load(file)
            loaded_data += data
    <span class="hljs-keyword">except</span>:
        loaded_data = data
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'data.json'</span>, <span class="hljs-string">'w'</span>, encoding=<span class="hljs-string">'utf-8'</span>) <span class="hljs-keyword">as</span> json_file:
        json.dump(loaded_data, json_file, indent=<span class="hljs-number">4</span>)
    print(<span class="hljs-string">'Data saved.'</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">producer</span>(<span class="hljs-params">queue1:Queue, items:list</span>):</span>
    <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> items:
        print(<span class="hljs-string">f'Producing <span class="hljs-subst">{item}</span>'</span>)
        queue1.put(item)
    queue1.put(<span class="hljs-literal">None</span>)
    print(<span class="hljs-string">'Producer done'</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">transformer</span>(<span class="hljs-params">queue1:Queue, queue2:Queue, factor:float</span>):</span>
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        item = queue1.get()
        <span class="hljs-keyword">if</span> item <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
            queue2.put(<span class="hljs-literal">None</span>)
            <span class="hljs-keyword">break</span>
        transformed = factor * item
        print(<span class="hljs-string">f'Transforming <span class="hljs-subst">{item}</span> -&gt; <span class="hljs-subst">{transformed}</span>'</span>)
        queue2.put(transformed)
    print(<span class="hljs-string">'Transformer done'</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save</span>(<span class="hljs-params">queue2:Queue</span>):</span>
    data = []
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        item = queue2.get()
        <span class="hljs-keyword">if</span> item <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
            <span class="hljs-keyword">break</span>
        print(<span class="hljs-string">f'Getting <span class="hljs-subst">{item}</span>'</span>)
        data.append(item)
    save_data_to_json(data)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    os.remove(<span class="hljs-string">'data.json'</span>)

    data = [<span class="hljs-number">2</span>,<span class="hljs-number">4</span>,<span class="hljs-number">6</span>,<span class="hljs-number">3</span>,<span class="hljs-number">5</span>,<span class="hljs-number">7</span>,<span class="hljs-number">9</span>]

    queue1 = Queue()
    queue2 = Queue()

    processes = []

    processes.append(Process(target=producer, args=(queue1, data)))
    processes.append(Process(target=transformer, args=(queue1, queue2, <span class="hljs-number">0.2</span>)))
    processes.append(Process(target=save, args=(queue2,)))

    <span class="hljs-keyword">for</span> process <span class="hljs-keyword">in</span> processes:
        process.start()

    <span class="hljs-keyword">for</span> process <span class="hljs-keyword">in</span> processes:
        process.join()

    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'data.json'</span>, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> file:
        saved_data = json.load(file)

    print(saved_data)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Producing 2
Producing 4
Producing 6
Producing 3
Producing 5
Producing 7
Producing 9
Producer <span class="hljs-keyword">done</span>
Transforming 2 -&gt; 0.4
Transforming 4 -&gt; 0.8
Transforming 6 -&gt; 1.2000000000000002
Transforming 3 -&gt; 0.6000000000000001
Transforming 5 -&gt; 1.0
Transforming 7 -&gt; 1.4000000000000001
Transforming 9 -&gt; 1.8
Transformer <span class="hljs-keyword">done</span>
Getting 0.4
Getting 0.8
Getting 1.2000000000000002
Getting 0.6000000000000001
Getting 1.0
Getting 1.4000000000000001
Getting 1.8
Data saved.
[0.4, 0.8, 1.2000000000000002, 0.6000000000000001, 1.0, 1.4000000000000001, 1.8]
</code></pre>
<h2 id="heading-asyncio">Asyncio</h2>
<ul>
<li><p>Asyncio does not use multiprocessing. It also does not rely on multithreading by default, although it can use threads in specific situations, which we will discuss later.</p>
</li>
<li><p>Instead, asyncio achieves concurrency using an *<strong>event loop</strong>. The event loop runs multiple *<strong>coroutines</strong> together by scheduling them and switching between them whenever a coroutine performs a *<strong>non-blocking operation</strong> and awaits it, instead of blocking execution.</p>
</li>
<li><p>In simple terms, coroutines cooperatively yield control, allowing other coroutines to run during I/O waits, which reduces idle time.</p>
</li>
<li><p>The <code>asyncio</code> library provides the event loop, coroutines, tasks and futures that work together with the <code>async</code> and <code>await</code> syntax built into Python.</p>
</li>
<li><p>When we need to run a blocking operation inside an asyncio program, asyncio provides a way to execute it in a separate thread so the event loop doesn’t get blocked. Internally, asyncio submits these tasks to a <code>ThreadPoolExecutor</code> (similar to what we saw earlier in the multithreading example).</p>
</li>
<li><p><code>aiohttp</code> is commonly used alongside <code>asyncio</code> as an asynchronous HTTP client and server framework. It is ideal for building RESTful APIs, handling a large number of concurrent network connections, and performing tasks like web scraping without blocking the event loop.</p>
</li>
</ul>
<h3 id="heading-demo-2">Demo</h3>
<p><strong>Asyncio Anime API:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> aiohttp
<span class="hljs-keyword">import</span> asyncio


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">AsyncAnimeAPI</span>:</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.query = <span class="hljs-string">'''
query ($id: Int) {
  Media (id: $id, type: ANIME) {
    id
    title {
      romaji
      english
      native
    }
  }
}
'''</span>
        self.url = <span class="hljs-string">'https://graphql.anilist.co'</span>

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">response</span>(<span class="hljs-params">self, anime_id: int</span>):</span>
        <span class="hljs-string">"""Async method to fetch anime data"""</span>
        variables = {
            <span class="hljs-string">'id'</span>: anime_id
        }

        <span class="hljs-keyword">try</span>:

          <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> aiohttp.ClientSession() <span class="hljs-keyword">as</span> session:
              <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> session.post(
                  self.url, 
                  json={<span class="hljs-string">'query'</span>: self.query, <span class="hljs-string">'variables'</span>: variables}
              ) <span class="hljs-keyword">as</span> response:

                  <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> response.json()

        <span class="hljs-keyword">except</span> asyncio.TimeoutError:
            <span class="hljs-keyword">return</span> {<span class="hljs-string">'error'</span>: <span class="hljs-string">'Timeout'</span>, <span class="hljs-string">'anime_id'</span>: anime_id}
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            <span class="hljs-keyword">return</span> {<span class="hljs-string">'error'</span>: str(e), <span class="hljs-string">'anime_id'</span>: anime_id}
</code></pre>
<p><code>timedec</code> <strong>decorator to calculate execution time:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> functools <span class="hljs-keyword">import</span> wraps
<span class="hljs-keyword">import</span> time


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">timedec</span>(<span class="hljs-params">operation_name:str</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">decorator</span>(<span class="hljs-params">func</span>):</span>
<span class="hljs-meta">        @wraps(func)</span>
        <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">wrapper</span>(<span class="hljs-params">*args, **kwargs</span>):</span>
            start = time.perf_counter()
            result = func(*args, **kwargs)
            end = time.perf_counter()
            print(<span class="hljs-string">f"Total time taken for <span class="hljs-subst">{operation_name}</span>: <span class="hljs-subst">{end - start}</span>."</span>)
            <span class="hljs-keyword">return</span> result
        <span class="hljs-keyword">return</span> wrapper
    <span class="hljs-keyword">return</span> decorator
</code></pre>
<p><strong>Asyncio implementation:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">from</span> api <span class="hljs-keyword">import</span> AnimeAPI
<span class="hljs-keyword">from</span> asyncio_api <span class="hljs-keyword">import</span> AsyncAnimeAPI
<span class="hljs-keyword">from</span> timedecorator <span class="hljs-keyword">import</span> timedec


<span class="hljs-meta">@timedec('synchronous execution')</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">sync_execution</span>(<span class="hljs-params">anime_ids:list</span>):</span>

    anime_api = AnimeAPI()

    <span class="hljs-keyword">for</span> anime_id <span class="hljs-keyword">in</span> anime_ids:
        <span class="hljs-keyword">try</span>:
            _ = anime_api.response(anime_id)
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            _ = e


<span class="hljs-meta">@timedec('asynchronous execution')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">async_execution</span>(<span class="hljs-params">anime_ids:list</span>):</span>

    anime_api = AsyncAnimeAPI()

    tasks = [anime_api.response(idx) <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> anime_ids]
    _ = <span class="hljs-keyword">await</span> asyncio.gather(*tasks)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    anime_ids = [<span class="hljs-number">100</span>,<span class="hljs-number">200</span>,<span class="hljs-number">300</span>,<span class="hljs-number">400</span>, <span class="hljs-number">700</span>, <span class="hljs-number">628</span>, <span class="hljs-number">524</span>, <span class="hljs-number">377</span>, <span class="hljs-number">826</span>, <span class="hljs-number">451</span>, <span class="hljs-number">280</span>, <span class="hljs-number">395</span>, <span class="hljs-number">399</span>, <span class="hljs-number">124</span>, <span class="hljs-number">626</span>]
    sync_execution(anime_ids)
    asyncio.run(async_execution(anime_ids))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Total time taken <span class="hljs-keyword">for</span> synchronous execution: 7.806094500003383.
Total time taken <span class="hljs-keyword">for</span> asynchronous execution: 3.0999071896076202e-06.
</code></pre>
<p><strong>When working with asyncio, blocking functions can freeze the event loop.</strong><br /><a target="_blank" href="http://asyncio.to"><code>asyncio.to</code></a><code>_thread()</code> lets you run those blocking operations in a separate thread, so your async program remains responsive.<br />Demo:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pprint <span class="hljs-keyword">import</span> pprint
<span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">from</span> api <span class="hljs-keyword">import</span> AnimeAPI <span class="hljs-comment">#Refer to this API code in 'Multithreading' section</span>


<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_response</span>(<span class="hljs-params">api:AnimeAPI, anime_id:int</span>):</span>
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> asyncio.to_thread(api.response, anime_id)


<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>

    taskgroup_anime_ids = [<span class="hljs-number">100</span>,<span class="hljs-number">200</span>,<span class="hljs-number">300</span>,<span class="hljs-number">400</span>,<span class="hljs-number">600</span>]

    anime_api = AnimeAPI()

    <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> asyncio.TaskGroup() <span class="hljs-keyword">as</span> tg:
        tasks = [tg.create_task(get_response(anime_api, idx)) <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> taskgroup_anime_ids]

    results = [task.result() <span class="hljs-keyword">for</span> task <span class="hljs-keyword">in</span> tasks]

    pprint(results)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    asyncio.run(main())
</code></pre>
<pre><code class="lang-bash">[{<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 100,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'Prétear: The New Legend of Snow '</span>
                                          <span class="hljs-string">'White'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'新白雪姫伝説プリーティア'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Shin Shirayuki-hime Densetsu '</span>
                                         <span class="hljs-string">'Pretear'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 200,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: None,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'天使な小生意気'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Tenshi na Konamaiki'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 300,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'3x3 Eyes'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'3×3EYES'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'3x3 EYES'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 400,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'Outlaw Star'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'星方武侠アウトロースター'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Seihou Bukyou Outlaw Star'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 600,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: None,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'レジェンドオブ・デュオ'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Legend of Duo'</span>}}}}]
</code></pre>
<p><strong>TaskGroup:</strong><br /><code>asyncio.TaskGroup</code> provides structured concurrency in Python. Instead of manually creating and managing tasks, a TaskGroup groups related asynchronous operations together and guarantees that they are all tracked, awaited, and cleaned up safely.<br />When you create tasks inside a TaskGroup, all of them run concurrently, and the block does not exit until every task completes. If any task raises an exception, the TaskGroup automatically cancels the remaining tasks and propagates the error in a predictable way. This prevents “orphan” background tasks, missing results, and silent failures that commonly occur when managing tasks manually with <code>create_task()</code>.<br />In simple terms, TaskGroup makes asynchronous code safer and more reliable by enforcing lifecycle management for tasks, so you do not have to do it yourself.</p>
<p><strong>gather vs TaskGroup:</strong><br /><code>asyncio.gather()</code> runs multiple coroutines and waits for results, but error handling can be messy and tasks may survive in weird states.<br /><code>TaskGroup</code> provides <strong>structured concurrency</strong>: tasks belong to a group, errors are handled predictably, and no task is accidentally left running in the background.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pprint <span class="hljs-keyword">import</span> pprint
<span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">from</span> asyncio_api <span class="hljs-keyword">import</span> AsyncAnimeAPI


<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>

    taskgroup_anime_ids = [<span class="hljs-number">100</span>,<span class="hljs-number">200</span>,<span class="hljs-number">300</span>,<span class="hljs-number">400</span>]
    independent_anime_ids = [<span class="hljs-number">600</span>,<span class="hljs-number">853</span>]

    anime_api = AsyncAnimeAPI()

    <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> asyncio.TaskGroup() <span class="hljs-keyword">as</span> tg:
        tasks = [tg.create_task(anime_api.response(idx)) <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> taskgroup_anime_ids]

    results = [task.result() <span class="hljs-keyword">for</span> task <span class="hljs-keyword">in</span> tasks]

    tasks2 = [asyncio.create_task(anime_api.response(idx)) <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> independent_anime_ids]

    results.extend(<span class="hljs-keyword">await</span> asyncio.gather(*tasks2))

    pprint(results)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    asyncio.run(main())
</code></pre>
<pre><code class="lang-bash">[{<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 100,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'Prétear: The New Legend of Snow '</span>
                                          <span class="hljs-string">'White'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'新白雪姫伝説プリーティア'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Shin Shirayuki-hime Densetsu '</span>
                                         <span class="hljs-string">'Pretear'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 200,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: None,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'天使な小生意気'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Tenshi na Konamaiki'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 300,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'3x3 Eyes'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'3×3EYES'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'3x3 EYES'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 400,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'Outlaw Star'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'星方武侠アウトロースター'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Seihou Bukyou Outlaw Star'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 600,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: None,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'レジェンドオブ・デュオ'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Legend of Duo'</span>}}}},
 {<span class="hljs-string">'data'</span>: {<span class="hljs-string">'Media'</span>: {<span class="hljs-string">'id'</span>: 853,
                     <span class="hljs-string">'title'</span>: {<span class="hljs-string">'english'</span>: <span class="hljs-string">'Ouran High School Host Club'</span>,
                               <span class="hljs-string">'native'</span>: <span class="hljs-string">'桜蘭高校ホスト部'</span>,
                               <span class="hljs-string">'romaji'</span>: <span class="hljs-string">'Ouran Koukou Host Club'</span>}}}}]
</code></pre>
<p><strong>Some more useful concepts in</strong> <code>asyncio</code> <strong>you can refer to:</strong></p>
<p><strong>Semaphore</strong><br />A semaphore is used to limit how many coroutines can run a specific piece of code at the same time. This is useful when hitting APIs with rate limits, restricting database connections, or controlling access to limited resources.<br />You acquire it before running a task and release it when done. If the limit is reached, other coroutines wait.<br />In short: <strong>Semaphore = controlled concurrency instead of unlimited concurrency.</strong></p>
<p><strong>AnimeAPI integrated with Semaphore:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">from</span> asyncio_api <span class="hljs-keyword">import</span> AsyncAnimeAPI


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SemaphoreDemo</span>(<span class="hljs-params">AsyncAnimeAPI</span>):</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, sem: asyncio.Semaphore</span>):</span>
        super().__init__()
        self.sem = sem

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">worker</span>(<span class="hljs-params">self, anime_id: int</span>):</span>
        print(<span class="hljs-string">f"Task <span class="hljs-subst">{anime_id}</span>: Waiting for semaphore..."</span>)

        <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> self.sem:
            print(<span class="hljs-string">f"Task <span class="hljs-subst">{anime_id}</span>: Acquired semaphore, starting request"</span>)
            result = <span class="hljs-keyword">await</span> self.response(anime_id)
            print(<span class="hljs-string">f"Task <span class="hljs-subst">{anime_id}</span>: Request completed"</span>)

        print(<span class="hljs-string">f"Task <span class="hljs-subst">{anime_id}</span>: Released semaphore"</span>)
        <span class="hljs-keyword">return</span> result
</code></pre>
<p><strong>Implementing Semaphore with value 2:</strong><br />This will allow only 2 operations to run concurrently at a time.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">from</span> sem <span class="hljs-keyword">import</span> SemaphoreDemo


<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    semaphore = asyncio.Semaphore(<span class="hljs-number">2</span>)
    demo = SemaphoreDemo(semaphore)

    anime_ids = [<span class="hljs-number">100</span>,<span class="hljs-number">200</span>,<span class="hljs-number">300</span>,<span class="hljs-number">400</span>]

    <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> asyncio.TaskGroup() <span class="hljs-keyword">as</span> tg:
        tasks = [tg.create_task(demo.worker(idx)) <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> anime_ids]

    _ = [task.result() <span class="hljs-keyword">for</span> task <span class="hljs-keyword">in</span> tasks]


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    asyncio.run(main())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-bash">Task 100: Waiting <span class="hljs-keyword">for</span> semaphore...
Task 100: Acquired semaphore, starting request
Task 200: Waiting <span class="hljs-keyword">for</span> semaphore...
Task 200: Acquired semaphore, starting request
Task 300: Waiting <span class="hljs-keyword">for</span> semaphore...
Task 400: Waiting <span class="hljs-keyword">for</span> semaphore...
Task 200: Request completed
Task 200: Released semaphore
Task 300: Acquired semaphore, starting request
Task 100: Request completed
Task 100: Released semaphore
Task 400: Acquired semaphore, starting request
Task 300: Request completed
Task 300: Released semaphore
Task 400: Request completed
Task 400: Released semaphore
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The above output shows that how Semaphore ensures only 2 operations at a time as configured.</div>
</div>

<p><strong>Lock</strong><br />A lock ensures that only one coroutine accesses a shared resource at a time. This prevents race conditions when multiple coroutines try to modify shared state.</p>
<p><strong>Event</strong><br />An event is a signaling mechanism. One coroutine can set an event, and others waiting on it will resume. Useful for coordination between tasks.</p>
<p><strong>Queue</strong><br /><code>asyncio.Queue</code> is designed for asynchronous producer–consumer pipelines. Producers put items in the queue, consumers await them. It provides built-in backpressure and prevents uncontrolled task growth.</p>
<p><strong>Cancellation &amp; Timeouts</strong><br />Asyncio supports cooperative cancellation. Tasks should be written to handle cancellation cleanly. <code>asyncio.wait_for()</code> or timeouts on APIs ensure long-running tasks don’t freeze the system.</p>
<p><strong>Backpressure and Flow Control</strong><br />Asyncio doesn’t magically solve overload problems. Use queues, semaphores, and proper design to avoid overwhelming external systems or your own application.</p>
<hr />
<h2 id="heading-comparison-table-for-the-concurrency-models">Comparison table for the concurrency models</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td><strong>Threading</strong></td><td><strong>Multiprocessing</strong></td><td><strong>Asyncio</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Best For</strong></td><td>I/O-bound tasks using blocking libraries</td><td>CPU-bound tasks needing true parallelism</td><td>Massive I/O concurrency using async APIs</td></tr>
<tr>
<td><strong>Execution Model</strong></td><td>Multiple OS threads in one process</td><td>Multiple independent processes</td><td>Single thread, event loop, cooperative multitasking</td></tr>
<tr>
<td><strong>GIL Impact</strong></td><td>Affected by GIL → only one thread executes Python bytecode at a time</td><td>Not affected (each process has its own interpreter &amp; GIL)</td><td>Not affected in typical usage; only one coroutine runs at a time by design</td></tr>
<tr>
<td><strong>Parallel CPU Execution</strong></td><td>❌ No real parallelism for Python code</td><td>✅ True parallelism across CPU cores</td><td>❌ No (unless explicitly offloading work)</td></tr>
<tr>
<td><strong>I/O Handling</strong></td><td>Works well because many I/O ops release GIL internally</td><td>Works but heavyweight for I/O</td><td>Best for non-blocking I/O (<code>await</code>)</td></tr>
<tr>
<td><strong>Scalability</strong></td><td>Moderate (dozens to hundreds of threads)</td><td>Good but expensive (RAM + process startup)</td><td>Excellent (thousands of concurrent tasks)</td></tr>
<tr>
<td><strong>Memory Usage</strong></td><td>Low to moderate</td><td>High (separate memory per process)</td><td>Very low</td></tr>
<tr>
<td><strong>Complexity</strong></td><td>Simple mental model</td><td>Moderate; requires IPC and data serialization</td><td>Higher learning curve; requires async-aware ecosystem</td></tr>
<tr>
<td><strong>Good Use Cases</strong></td><td>File I/O, blocking network calls, background workers, GUI apps</td><td>Data processing, ML workloads, CPU crunching, parallel computation</td><td>APIs, web servers, scraping at scale, chat apps, streaming</td></tr>
<tr>
<td><strong>Bad Use Cases</strong></td><td>CPU-bound workloads</td><td>Extremely lightweight tasks that don’t justify process overhead</td><td>CPU-bound work or blocking libraries</td></tr>
<tr>
<td><strong>Typical Tools</strong></td><td><code>threading</code>, <code>ThreadPoolExecutor</code></td><td><code>multiprocessing</code>, <code>ProcessPoolExecutor</code></td><td><code>asyncio</code>, <code>aiohttp</code>, async DB drivers</td></tr>
<tr>
<td><strong>Failure Mode if Misused</strong></td><td>Thread explosion, overhead</td><td>High overhead, complex debugging, shared state pain</td><td>Event loop freezes if blocking code sneaks in</td></tr>
</tbody>
</table>
</div><h2 id="heading-quick-rule-of-thumb-choose-concurrency-model-for-my-task"><strong>Quick rule of thumb</strong> choose concurrency model for my task</h2>
<ul>
<li><p><strong>I/O + blocking libs + manageable concurrency</strong> → Threading</p>
</li>
<li><p><strong>CPU-bound, need real parallel speedup</strong> → Multiprocessing</p>
</li>
<li><p><strong>Huge I/O concurrency + async ecosystem available</strong> → Asyncio</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Zenalyze: My AI-Assisted Data Analysis Tool (And Why I Built It)]]></title><description><![CDATA[Most AI “data analysis” tools today fall into two groups:

They pretend to analyze your data but don’t actually run code.

They demand you upload your data to some cloud black box.


Neither works for real-world analytics.
I wanted something differen...]]></description><link>https://techtrail.tuhindutta.com/zenalyze-ai-assisted-data-analysis-tool</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/zenalyze-ai-assisted-data-analysis-tool</guid><category><![CDATA[Analytics Agent]]></category><category><![CDATA[AI]]></category><category><![CDATA[llm]]></category><category><![CDATA[Python]]></category><category><![CDATA[data analysis]]></category><category><![CDATA[generative ai]]></category><category><![CDATA[Developer Tools]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[AI Coding Assistant]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[python libraries]]></category><category><![CDATA[pandas]]></category><category><![CDATA[PySpark]]></category><category><![CDATA[automation]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Mon, 17 Nov 2025 16:27:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/-WXQm_NTK0U/upload/3b5d0ceaab468bde4c2a6379e497e0d9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most AI “data analysis” tools today fall into two groups:</p>
<ol>
<li><p><strong>They pretend to analyze your data but don’t actually run code.</strong></p>
</li>
<li><p><strong>They demand you upload your data to some cloud black box.</strong></p>
</li>
</ol>
<p>Neither works for real-world analytics.</p>
<p>I wanted something different.<br />Something that could sit right in my local environment, understand my tables, generate real Python code, execute it, and help me explore data the same way an actual teammate would.</p>
<p>That’s where <strong>Zenalyze</strong> came from — a lightweight package that turns LLMs into a practical coding partner <em>without ever exposing your actual data values</em>.</p>
<p><a target="_blank" href="https://tuhindutta.github.io/Zenalyze/"><strong><em>GitHub Package Documentation</em></strong></a></p>
<p>Let me walk you through the motivation, design thinking, and how it fits into a real workflow.</p>
<hr />
<h2 id="heading-the-problem-i-wanted-to-solve">🧩 The Problem I Wanted to Solve</h2>
<p>Anyone working with Pandas or PySpark knows the cycle:</p>
<ul>
<li><p>Load data</p>
</li>
<li><p>Look at shapes, missing values, weird fields</p>
</li>
<li><p>Write a bunch of boilerplate</p>
</li>
<li><p>Rinse and repeat for every analysis step</p>
</li>
</ul>
<p>And every time you want to try something new, you end up rewriting the same code:</p>
<pre><code class="lang-json">df.groupby(...).agg(...)
</code></pre>
<pre><code class="lang-json">df.merge(...)
</code></pre>
<pre><code class="lang-json">df.plot(...)
</code></pre>
<p>I wanted a tool that handled this repetitive side of analysis, while still letting me remain in control of the code. Something that generates real Python, runs in my own environment, and behaves predictably.</p>
<hr />
<h2 id="heading-the-motivation-behind-zenalyze">🎯 The Motivation Behind Zenalyze</h2>
<p>A few core ideas shaped the project:</p>
<h3 id="heading-1-llms-should-help-you-code-not-replace-your-environment">1. <strong>LLMs should help you code, not replace your environment</strong></h3>
<p>I didn’t want a chatbot that tells me what <em>could</em> work.<br />I wanted a companion that writes actual code I can run right away.</p>
<h3 id="heading-2-your-data-never-leaves-your-machine">2. <strong>Your data never leaves your machine</strong></h3>
<p>If you’re analyzing customer revenue, fraud records, supply chain data, medical outcomes — the last thing you want is your rows flying off into the internet.</p>
<p>Zenalyze only sends <strong>metadata</strong>, not data.</p>
<h3 id="heading-3-history-aware-analysis">3. <strong>History-aware analysis</strong></h3>
<p>LLMs forget.<br />Data analysts don’t have time to babysit them.</p>
<p>Zenalyze:</p>
<ul>
<li><p>tracks every step</p>
</li>
<li><p>remembers derived columns</p>
</li>
<li><p>summarizes past actions</p>
</li>
<li><p>reuses existing variables</p>
</li>
<li><p>never re-imports Pandas/Spark unnecessarily</p>
</li>
</ul>
<p>So the conversation stays consistent, and the code becomes cleaner over time.</p>
<h3 id="heading-4-make-the-experience-fun">4. <strong>Make the experience fun</strong></h3>
<p>I didn’t want another “heavy enterprise tool”.<br />Just a friendly, intelligent coding buddy in my notebook.</p>
<hr />
<h2 id="heading-security-close-to-your-data-never-inside-it">🔐 Security: Close to Your Data, Never Inside It</h2>
<p>One thing I was very firm about:<br /><strong>Zenalyze should never see raw data.</strong></p>
<p>And it doesn’t.</p>
<p>It only extracts and uses:</p>
<ul>
<li><p>column names</p>
</li>
<li><p>descriptions</p>
</li>
<li><p>data types</p>
</li>
<li><p>row/column counts</p>
</li>
<li><p>null percentages</p>
</li>
<li><p>high-level distributions</p>
</li>
<li><p>patterns</p>
</li>
<li><p>derived fields created in earlier steps</p>
</li>
</ul>
<p>This is enough to provide context for the LLM to generate correct code, but not enough to reveal anything sensitive.</p>
<p>Think of it as letting someone read your database schema without giving them access to the rows.</p>
<h3 id="heading-but-use-it-responsibly">But use it responsibly.</h3>
<p>Even though Zenalyze never touches actual records, good practice is to run it in an <strong>isolated and monitored environment</strong>:</p>
<ul>
<li><p>Jupyter inside a virtual environment</p>
</li>
<li><p>Controlled outbound/inbound network rules</p>
</li>
<li><p>No access to production systems</p>
</li>
<li><p>Zero trust toward external LLMs you didn’t configure</p>
</li>
</ul>
<p>Because yes — while Zenalyze won’t misbehave on its own, a malicious or rogue LLM can try to generate harmful code.<br />Not likely unless someone purposely built an LLM for chaos, but still worth mentioning.</p>
<p>Smart tools deserve smart environments.</p>
<hr />
<h2 id="heading-what-zenalyze-actually-does">🤝 What Zenalyze Actually Does</h2>
<p>When you interact with it:</p>
<pre><code class="lang-json">zen.do(<span class="hljs-string">"calculate total revenue per customer"</span>)
</code></pre>
<p>It does a few things behind the scenes:</p>
<ul>
<li><p>builds a detailed prompt with metadata</p>
</li>
<li><p>injects the correct dataset references</p>
</li>
<li><p>generates the Python code</p>
</li>
<li><p>executes the code right in your environment</p>
</li>
<li><p>saves the result as a variable</p>
</li>
<li><p>remembers what you just did</p>
</li>
<li><p>lets you ask follow-up questions through the buddy:</p>
</li>
</ul>
<pre><code class="lang-json">zen.buddy(<span class="hljs-string">"Explain what we did in the last step"</span>)
</code></pre>
<p>It’s smooth, predictable, and feels like working with a junior analyst who never gets tired.</p>
<hr />
<h2 id="heading-why-its-called-zenalyze">🧘 Why It's Called <em>Zen</em>alyze</h2>
<p>Because the tool’s job is to take the chaotic part of exploratory analysis — the constant back-and-forth, the rewriting, the checking, the clutter — and make it calm, clean, and focused.</p>
<p>Data work shouldn’t feel like fighting your tools.<br />It should feel like thinking clearly.</p>
<p>That’s the vibe.</p>
<hr />
<h2 id="heading-setup-amp-environment-notes">🛠️ Setup &amp; Environment Notes</h2>
<p>Before we get into the demo, a few practical reminders:</p>
<ul>
<li><p>Always use a <code>.env</code> file for API keys</p>
</li>
<li><p>Keep your environment isolated (venv/conda)</p>
</li>
<li><p>Monitor outbound connections</p>
</li>
<li><p>Use secure LLM providers you trust</p>
</li>
<li><p>Keep datasets local or on controlled Spark clusters</p>
</li>
</ul>
<p>Zenalyze integrates tightly with Pandas and PySpark, so as long as your environment is tidy, the experience will be clean.</p>
<hr />
<h2 id="heading-installation">📦 Installation</h2>
<p>Once the package is on PyPI:</p>
<pre><code class="lang-json">pip install zenalyze
</code></pre>
<p>Or directly from GitHub:</p>
<pre><code class="lang-json">pip install git+https:<span class="hljs-comment">//github.com/tuhindutta/Zenalyze.git</span>
</code></pre>
<hr />
<h2 id="heading-demo-time">🚀 Demo Time</h2>
<p>Now let’s actually use Zenalyze and see what it feels like in a real environment.</p>
<p>Don’t worry — this part is straightforward. No complicated infra, no scary configs.<br />Just a clean Python setup and a couple of environment variables.</p>
<hr />
<h3 id="heading-1-create-a-virtual-environment">1️⃣ Create a Virtual Environment</h3>
<p>Always start in a clean workspace. It keeps things tidy and avoids package mess.</p>
<pre><code class="lang-json">python -m venv .venv
source .venv/bin/activate      # Mac/Linux
# or
.\.venv\Scripts\activate       # Windows
</code></pre>
<p>You should now see <code>(.venv)</code> in your terminal prompt.</p>
<hr />
<h3 id="heading-2-install-zenalyze">2️⃣ Install Zenalyze</h3>
<p>If you installed from GitHub:</p>
<pre><code class="lang-json">pip install git+https:<span class="hljs-comment">//github.com/tuhindutta/Zenalyze.git</span>
</code></pre>
<p>Once it's on PyPI, you'll switch to:</p>
<pre><code class="lang-json">pip install zenalyze
</code></pre>
<hr />
<h3 id="heading-3-add-your-environment-variables">3️⃣ Add Your Environment Variables</h3>
<p>Zenalyze uses three environment variables.<br />You can put them in a <code>.env</code> file, export them directly, or load them through your preferred method.</p>
<p><strong>Required / Optional Env Variables</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Variable</td><td>Purpose</td><td>Default</td></tr>
</thead>
<tbody>
<tr>
<td><code>MODEL</code></td><td>Main LLM for code generation</td><td><code>openai/gpt-oss-120b</code></td></tr>
<tr>
<td><code>GROQ_API_KEY</code></td><td>API key for your LLM provider</td><td><strong>none</strong> (must provide if using Groq)</td></tr>
<tr>
<td><code>BUDDY_MODEL</code></td><td>LLM for natural-language buddy responses</td><td><code>openai/gpt-oss-120b</code></td></tr>
<tr>
<td><code>CODE_SUMMARIZER_MODEL</code></td><td>LLM for summarizing long code histories</td><td><code>openai/gpt-oss-120b</code></td></tr>
</tbody>
</table>
</div><p><strong>Example</strong> <code>.env</code> <strong>file</strong></p>
<p>Create a file named <code>.env</code> in your project folder:</p>
<pre><code class="lang-json">MODEL=openai/gpt-oss<span class="hljs-number">-120</span>b
BUDDY_MODEL=openai/gpt-oss<span class="hljs-number">-120</span>b
CODE_SUMMARIZER_MODEL=openai/gpt-oss<span class="hljs-number">-120</span>b

GROQ_API_KEY=your_groq_key_here
</code></pre>
<p>Load it using python-dotenv (optional but convenient)</p>
<pre><code class="lang-json">pip install python-dotenv
</code></pre>
<p>In a notebook or script:</p>
<pre><code class="lang-json">from dotenv import load_dotenv
load_dotenv()
</code></pre>
<p>And you’re good to go.</p>
<hr />
<h3 id="heading-4prepare-your-data-folder-and-optional-description-file">4️⃣Prepare Your Data Folder (and Optional Description File)</h3>
<p>Let’s set up the data Zenalyze will work with.</p>
<p>Start by creating a simple <code>./data</code> directory and drop in a few CSV or Excel files.</p>
<p>Example structure:</p>
<pre><code class="lang-json">project/
 ├── .env
 ├── demo.ipynb
 └── data/
       ├── customers.csv
       ├── orders.csv
       └── desc.json             <span class="hljs-comment">// optional but highly recommended (discussed below)</span>
</code></pre>
<p>Zenalyze will automatically scan this folder, load the files, and extract metadata like column names, dtypes, null percentages, and patterns.<br />That’s enough for it to start generating clean, context-aware analysis code.</p>
<hr />
<h3 id="heading-optional-but-highly-recommended-add-a-descjson-file">⭐ Optional but Highly Recommended: Add a <code>desc.json</code> File</h3>
<p>If you want Zenalyze to understand <strong>what your tables actually represent</strong> rather than just their structure, you can provide a <code>desc.json</code> file in the working directory.</p>
<p>This file lets you describe, in your own words:</p>
<ul>
<li><p>what each table means</p>
</li>
<li><p>business/domain context</p>
</li>
<li><p>what each column represents</p>
</li>
<li><p>any notes you'd want an analyst to know</p>
</li>
</ul>
<p>There’s no strict formatting rule — you can phrase descriptions however you prefer.<br />The <strong>only requirement</strong> is that the top-level keys match your table names <strong>without file extensions</strong>.</p>
<p>For example:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"customers"</span>: {
        <span class="hljs-attr">"data_desc"</span>: <span class="hljs-string">"customer master table"</span>,
        <span class="hljs-attr">"columns_desc"</span>: {
            <span class="hljs-attr">"customer_id"</span>: <span class="hljs-string">"unique customer identifier"</span>,
            <span class="hljs-attr">"region"</span>: <span class="hljs-string">"geographical region"</span>
        }
    },

    <span class="hljs-attr">"orders"</span>: {
        <span class="hljs-attr">"data_desc"</span>: <span class="hljs-string">"transaction-level order data"</span>,
        <span class="hljs-attr">"columns_desc"</span>: {
            <span class="hljs-attr">"order_id"</span>: <span class="hljs-string">"unique id for each order"</span>,
            <span class="hljs-attr">"amount"</span>: <span class="hljs-string">"order total value"</span>
        }
    }
}
</code></pre>
<p>Name this file exactly as <code>desc.json</code> and place inside <code>data/</code>.</p>
<hr />
<h3 id="heading-dont-want-to-write-it-manually">🤝 Don’t Want to Write It Manually?</h3>
<p>Zenalyze can generate a template for you.</p>
<p>Once Zenalyze is initialized, just run:</p>
<pre><code class="lang-python">zen.create_description_template_file(forced=<span class="hljs-literal">True</span>)
</code></pre>
<p>This will create a <code>desc.json</code> template file inside the appropriate destination — you only need to fill in the details and reinitialize the Zenalyze instance.</p>
<hr />
<h3 id="heading-5-initialize-zenalyze">5️⃣ Initialize Zenalyze</h3>
<p>Fire up Jupyter Notebook and inside your notebook:</p>
<pre><code class="lang-json">from zenalyze import create_zenalyze_object_with_env_var_and_last5_hist

zen = create_zenalyze_object_with_env_var_and_last5_hist(globals(), <span class="hljs-string">"./data"</span>)
</code></pre>
<p>This does a lot for you:</p>
<ul>
<li><p>loads your datasets</p>
</li>
<li><p>extracts metadata</p>
</li>
<li><p>sets up history retention</p>
</li>
<li><p>configures the LLM models</p>
</li>
<li><p>prepares your analysis environment</p>
</li>
</ul>
<p>You'll now have variables like <code>customers</code>, <code>orders</code>, etc. injected into your session automatically.</p>
<h3 id="heading-demo-notebookhttpstuhinduttagithubiozenalyzedemopdf"><a target="_blank" href="https://tuhindutta.github.io/Zenalyze/demo.pdf"><strong><em>Demo Notebook</em></strong></a></h3>
<hr />
<h2 id="heading-final-thoughts">🎁 Final Thoughts</h2>
<p>Zenalyze isn’t meant to be another giant enterprise tool with a 100-page manual.</p>
<p>It’s meant to be:</p>
<ul>
<li><p>simple</p>
</li>
<li><p>lightweight</p>
</li>
<li><p>developer-friendly</p>
</li>
<li><p>safe</p>
</li>
<li><p>genuinely helpful</p>
</li>
</ul>
<p>If it makes data exploration even a little bit smoother, cleaner, or more fun — it’s doing its job.</p>
<p>And this is only the beginning.</p>
]]></content:encoded></item><item><title><![CDATA[Machine Learning Experimentation with Airflow]]></title><description><![CDATA[Problem Statement
While experimenting with machine learning models — tuning hyperparameters with Bayesian methods, running cross-validations, and optimizing trials using Optuna with MLflow tracking — I found myself constantly fighting the same proble...]]></description><link>https://techtrail.tuhindutta.com/machine-learning-experimentation-with-airflow</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/machine-learning-experimentation-with-airflow</guid><category><![CDATA[Devops]]></category><category><![CDATA[mlops]]></category><category><![CDATA[apache-airflow]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[CI/CD]]></category><category><![CDATA[#Nexus]]></category><category><![CDATA[Python]]></category><category><![CDATA[mlflow]]></category><category><![CDATA[data-engineering]]></category><category><![CDATA[#model-deployment]]></category><category><![CDATA[automation]]></category><category><![CDATA[experiment tracking]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Tue, 04 Nov 2025 03:29:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762185058285/cf7d9aa7-f356-4d71-b583-f9288b8c6998.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-problem-statement">Problem Statement</h2>
<p>While experimenting with machine learning models — tuning hyperparameters with Bayesian methods, running cross-validations, and optimizing trials using <strong>Optuna</strong> with <strong>MLflow</strong> tracking — I found myself constantly fighting the same problem: <strong>code dependency and fragility</strong>.</p>
<p>Every new experiment required changing code, updating configurations, and risking breakages. Switching datasets or parameters meant tweaking multiple scripts, rerunning environment setups, and occasionally debugging things that weren’t even related to the experiment.</p>
<p>It became clear that I needed a <strong>stable, self-contained environment</strong> that could:</p>
<ul>
<li><p>Handle <strong>ETL and ML pipeline orchestration</strong> automatically.</p>
</li>
<li><p>Let me experiment safely <strong>without touching core code</strong>.</p>
</li>
<li><p>Log and version results in MLflow.</p>
</li>
<li><p>Be reproducible across systems — from my laptop to a CI/CD server.</p>
</li>
</ul>
<p>That realization drove the motivation behind this framework — an <strong>Airflow-based, Jenkins-triggered, Docker-deployed system</strong> designed to give me a ready-to-use experimentation environment.</p>
<hr />
<h2 id="heading-motivation">Motivation</h2>
<p>The idea wasn’t just to automate; it was to <strong>decouple experimentation from infrastructure</strong>. What I wanted instead was <strong>a repeatable and isolated environment</strong> where I could:</p>
<ul>
<li><p>Spin up an <strong>ETL + training pipeline</strong> without touching the core code.</p>
</li>
<li><p><strong>Experiment safely</strong> using parameters or configurations, not manual edits.</p>
</li>
<li><p>Automatically <strong>log, register, and version models</strong> via MLflow.</p>
</li>
<li><p>Keep the entire system <strong>portable and rebuildable</strong> — same behavior on any machine or CI runner.</p>
</li>
<li><p>Experimentation could move from “code edits” to <strong>config and parameter-based triggers</strong>.</p>
</li>
</ul>
<p>Essentially, I wanted the <strong>luxury of iteration speed without the anxiety of setup</strong>.</p>
<p>That’s when I started designing an <strong>Airflow-based framework that could self-deploy via Jenkins</strong>, using <strong>Docker Compose</strong> for orchestration and <strong>Nexus</strong> for controlled dependency management. That led to building a <strong>self-contained Airflow framework</strong>, designed to:</p>
<ul>
<li><p>Launch via <strong>Jenkins CI/CD</strong> with a single trigger.</p>
</li>
<li><p>Use <strong>Docker Compose</strong> to orchestrate an <strong>Airflow Celery cluster</strong> for distributed tasks.</p>
</li>
<li><p>Dynamically install <strong>custom Python dependencies from a private Nexus repo</strong>, without editing the Docker image manually.</p>
</li>
<li><p>Provide a <strong>ready-to-run sandbox</strong> for model training, ETL pipelines, and experiment tracking.</p>
</li>
</ul>
<hr />
<h2 id="heading-implementation-overview">Implementation Overview</h2>
<p>For those interested in the complete technical setup — including Dockerfile, Jenkins pipeline, and Docker Compose configurations — I’ve documented everything in detail here:<br />🔗 <a target="_blank" href="https://tuhindutta.github.io/airflow-setup-celery-framework/"><strong>Airflow Celery Framework Documentation</strong></a></p>
<p>Below, I’ll focus on how the system works internally — the practical workflow, integration points, and reasoning behind some of the implementation choices.</p>
<h3 id="heading-1-the-jenkins-cicd-pipeline-the-automation-backbone">1. The Jenkins CI/CD Pipeline — The Automation Backbone</h3>
<p>Jenkins is the central automation layer.<br />It eliminates the need for manual Docker builds or direct command-line work.<br />Instead, a user (or a scheduled trigger) starts a build with configurable parameters such as:</p>
<ul>
<li><p><code>NEXUS_URL</code> → private PyPI/Nexus repository URL</p>
</li>
<li><p><code>NEXUS_CREDS_ID</code> → Jenkins credentials ID for Nexus authentication</p>
</li>
<li><p><code>DEV_DIR</code> → target build directory for staging</p>
</li>
<li><p><code>REQUIREMENTS</code> &amp; <code>CUSTOM_REQUIREMENTS</code> → dependency lists to pull at runtime</p>
</li>
</ul>
<p>Once triggered, the pipeline:</p>
<ol>
<li><p><strong>Cleans the workspace</strong> and checks out the repo.</p>
</li>
<li><p><strong>Prepares the environment</strong>, creating the build directory and injecting Nexus credentials as Docker BuildKit secrets.</p>
</li>
<li><p><strong>Fetches dependency files dynamically</strong> (curling them from URLs provided in the parameters).</p>
</li>
<li><p><strong>Builds the Airflow image</strong> using those dependencies and secrets.</p>
</li>
<li><p><strong>Runs</strong> <code>docker compose up -d</code> to bring up all Airflow services.</p>
</li>
<li><p><strong>Cleans sensitive files</strong> (Nexus creds, .env, Dockerfile, compose YAML) to keep the environment safe.</p>
</li>
</ol>
<p>This approach ensures every build is <strong>fresh, reproducible, and isolated</strong>, without needing to manually rebuild or edit Dockerfiles.</p>
<h3 id="heading-2-docker-image-design-parameterized-and-secure">2. Docker Image Design — Parameterized and Secure</h3>
<p>The <strong>Dockerfile</strong> extends the official Airflow image (<code>apache/airflow:3.1.0</code>) and is designed to be <strong>parameterized</strong> rather than static.</p>
<p>It introduces:</p>
<ul>
<li><p><code>ARG INDEX_URL</code> and <code>ENV PYPI_URL</code> for flexible dependency sources.</p>
</li>
<li><p>BuildKit secret mounts (<code>nexus_user</code>, <code>nexus_pass</code>) to inject credentials securely.</p>
</li>
<li><p>Multi-layer installs for clean separation between public and private dependencies.</p>
</li>
</ul>
<p>This design means you can:</p>
<ul>
<li><p>Swap dependency sets without modifying the Dockerfile.</p>
</li>
<li><p>Point to different Nexus repositories across environments (dev, staging, prod).</p>
</li>
<li><p>Rebuild instantly from Jenkins with zero code edits.</p>
</li>
</ul>
<p>It’s a true <em>“define once, reuse everywhere”</em> model.</p>
<h3 id="heading-3-dependency-management-via-nexus">3. Dependency Management via Nexus</h3>
<p>Instead of pushing all dependencies to PyPI or including them in the repo, private packages are hosted in <strong>Nexus</strong>.</p>
<p>Here’s how the flow works:</p>
<ol>
<li><p>Jenkins reads <code>NEXUS_CREDS_ID</code> and exposes username/password as Docker secrets.</p>
</li>
<li><p>During the build, Docker mounts these credentials temporarily at <code>/run/secrets</code>.</p>
</li>
<li><p>Pip installs private dependencies using the provided <code>INDEX_URL</code> (from Nexus).</p>
</li>
<li><p>The credentials vanish after build completion — never written to image layers or logs.</p>
</li>
</ol>
<p>This method is both <strong>secure</strong> and <strong>scalable</strong>, enabling enterprise-style dependency control with zero manual interference.</p>
<h3 id="heading-4-execution-lifecycle-summary">4. Execution Lifecycle Summary</h3>
<p>Here’s what a full run looks like:</p>
<pre><code class="lang-json">[<span class="hljs-number">1</span>] Jenkins Job Triggered (manual or scheduled)
     ↓
[<span class="hljs-number">2</span>] Parameters read → environment prepared
     ↓
[<span class="hljs-number">3</span>] Nexus credentials injected securely
     ↓
[<span class="hljs-number">4</span>] Docker build starts with secrets + dependency files
     ↓
[<span class="hljs-number">5</span>] Custom Airflow image built dynamically
     ↓
[<span class="hljs-number">6</span>] docker-compose up -d (Airflow + Redis + Postgres + Flower)
     ↓
[<span class="hljs-number">7</span>] Secrets &amp; temp files cleaned
     ↓
[<span class="hljs-number">8</span>] Airflow UI accessible → ready for DAGs and experiments
</code></pre>
<h3 id="heading-5-design-priorities">5. Design Priorities</h3>
<p>The system was built with three guiding principles:</p>
<ul>
<li><p><strong>Isolation</strong> — every environment is self-contained and disposable.</p>
</li>
<li><p><strong>Reproducibility</strong> — build once, deploy anywhere, get the same behavior.</p>
</li>
<li><p><strong>Security</strong> — credentials never persist, even in intermediate Docker layers.</p>
</li>
</ul>
<p>These principles make it flexible enough for both individual experiments and team-scale deployments.</p>
<hr />
<h2 id="heading-how-i-used-the-framework">How I Used the Framework</h2>
<p>After building the Airflow–Jenkins–Docker setup, I wanted to validate it with an actual end-to-end ML project. The goal was to see if this framework could <strong>handle real experimentation</strong>, versioning, and deployment workflows — not just spin up containers.</p>
<h3 id="heading-1-building-and-versioning-the-project-package">1. Building and Versioning the Project Package</h3>
<p>I started locally with a project that handled <strong>ETL and model training</strong>, fully integrated with <strong>MLflow</strong> for experiment tracking.<br />Instead of running it as loose scripts, I packaged the entire project into a <strong>Python wheel (</strong><code>.whl</code>) using <code>setuptools</code>.</p>
<p>To automate this:</p>
<ul>
<li><p>I created a <strong>CI/CD pipeline in Jenkins</strong> dedicated to building, versioning, and publishing this wheel.</p>
</li>
<li><p>Every run of the pipeline created a new, versioned artifact (e.g., <code>project_name-0.1.4-py3-none-any.whl</code>).</p>
</li>
<li><p>The wheel, along with all its dependencies, was uploaded to my <strong>private Nexus repository</strong>, making it accessible like any other PyPI package.</p>
</li>
</ul>
<p>For Airflow to access it during runtime, I exposed the Nexus repository securely using <strong>ngrok</strong>, which allowed local or private-network access from the containerized environment.</p>
<h3 id="heading-2-integrating-with-the-airflow-deployment-framework">2. Integrating with the Airflow Deployment Framework</h3>
<p>Once the package was available in Nexus, I used another CI/CD pipeline — the one based on my <strong>Airflow Celery framework</strong> — to automatically:</p>
<ul>
<li><p><strong>Pull the wheel</strong> from Nexus along with any additional dependencies,</p>
</li>
<li><p><strong>Build the custom Airflow image</strong> through the Dockerfile that installs those dependencies dynamically, and</p>
</li>
<li><p><strong>Bring up the entire Airflow environment</strong> via <code>docker compose up -d</code>.</p>
</li>
</ul>
<p>With this, I now had a fully operational and reproducible Airflow setup — built, configured, and ready with just a few clicks.</p>
<h3 id="heading-3-running-training-and-deployment-pipelines">3. Running Training and Deployment Pipelines</h3>
<p>Next, I wrote two DAGs to test the framework’s integration capabilities.</p>
<ul>
<li><p><strong>Training DAG:</strong></p>
<ul>
<li><p>Loads configuration files specifying hyperparameters.</p>
</li>
<li><p>Runs ETL and training steps using the packaged project wheel.</p>
</li>
<li><p>Tracks experiments, metrics, and models using <strong>MLflow</strong>.</p>
</li>
<li><p>Results and artifacts (models, metrics, plots, etc.) appear automatically in the MLflow UI.</p>
</li>
</ul>
</li>
<li><p><strong>Deployment DAG:</strong></p>
<ul>
<li><p>Fetches the required model artifact from a provided <strong>MLflow URI</strong>.</p>
</li>
<li><p>Handles deployment logic — either pushing to cloud, an endpoint, or a designated inference environment.</p>
</li>
</ul>
</li>
</ul>
<p>These two DAGs validated the entire pipeline: from data processing → experiment tracking → artifact management → deployment handoff.</p>
<h3 id="heading-4-the-outcome">4. The Outcome</h3>
<p>By combining these pipelines, I ended up with a <strong>completely automated, reproducible ML experimentation and deployment framework</strong>:</p>
<ul>
<li><p>Jenkins handles build, versioning, and deployment triggers.</p>
</li>
<li><p>Docker + Airflow provide a consistent execution environment.</p>
</li>
<li><p>Nexus acts as the private dependency registry.</p>
</li>
<li><p>MLflow manages experiment tracking and model artifacts.</p>
</li>
</ul>
<p>The best part?<br />It’s modular — I can plug in any ML project following the same structure and get a working Airflow environment with versioned dependencies and clean experiment tracking in minutes.</p>
<hr />
<h2 id="heading-whats-next">What’s Next</h2>
<p>The next part of this series will focus on the <strong>actual project and deployment architecture</strong> — how the training pipeline was structured, how model promotion and validation were handled, and how deployment was automated in the cloud.</p>
<p>For now, this post covers the <strong>framework, setup, and environment</strong> that made all of that possible.</p>
]]></content:encoded></item><item><title><![CDATA[Testing the Nutriguide API]]></title><description><![CDATA[Introduction
An API uses nutritional data from various products and a user's health status to create a chatbot powered by LLM. Users can interact with this chatbot to make and plan their choices.
The GitHub Documentation and DockerHub documentation p...]]></description><link>https://techtrail.tuhindutta.com/testing-the-nutriguide-api</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/testing-the-nutriguide-api</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[llm]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Nutrition]]></category><category><![CDATA[APIs]]></category><category><![CDATA[LLaMa]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Sat, 30 Aug 2025 14:15:56 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/x2lMeLwtmFM/upload/21555a6bc30a840aa5a3a12b494b12c2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>An API uses nutritional data from various products and a user's health status to create a chatbot powered by LLM. Users can interact with this chatbot to make and plan their choices.</p>
<p>The <a target="_blank" href="https://tuhindutta.github.io/nutriguide/">GitHub Documentation</a> and <a target="_blank" href="https://hub.docker.com/r/tkdutta/nutriguide-api">DockerHub documentation</a> provide guides for downloading the container and using it locally. You can also deploy it by following a few simple steps. For more details on using the API, refer to the API documentation in the GitHub documentation.</p>
<h2 id="heading-testing">Testing</h2>
<ol>
<li><p>Install Docker Desktop.</p>
</li>
<li><p>Install <a target="_blank" href="https://www.postman.com/downloads/">Postman</a>.</p>
</li>
<li><p>Follow the user guide in the DockerHub documentation.</p>
</li>
<li><p>Create 3 instances of <code>POST</code> requests for these endpoints:</p>
<ul>
<li><p><code>/health</code></p>
</li>
<li><p><code>/products</code></p>
</li>
<li><p><code>/query</code></p>
</li>
</ul>
</li>
<li><p>Send the health status through the <code>/health</code> endpoint. For this example, let’s use:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"health_status"</span>: <span class="hljs-string">"I am hypertensive patient with a slight high BP of 130/92. I have diabetis around 135. I am 5 feet 8 inches tall with 78 kg of weight. I want protien."</span>
 }
</code></pre>
</li>
<li><p>Send the nutritional information of the products through the <code>/products</code> endpoint. For this example, let’s use:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"products"</span>: [
         {
             <span class="hljs-attr">"data"</span>: {
                 <span class="hljs-attr">"Nutritional Content"</span>: {
                     <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"40.5 g"</span>,
                     <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"56.5 g"</span>,
                     <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"13.7 mg"</span>,
                     <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"559 kcal"</span>,
                     <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"6.1 g"</span>,
                     <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"21.15 g"</span>,
                     <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"212 mg"</span>,
                     <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"34.85 g"</span>,
                     <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"48.6 g"</span>,
                     <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"0.15 g"</span>
                 }
             },
             <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Cadbury Dairy Milk Silk Oreo per 100g"</span>
         },
         {
             <span class="hljs-attr">"data"</span>: {
                 <span class="hljs-attr">"Nutritional Content"</span>: {
                     <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"53 g"</span>,
                     <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"0 mg"</span>,
                     <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"536 kcal"</span>,
                     <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"7 g"</span>,
                     <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"3.3 g"</span>,
                     <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"525 mg"</span>,
                     <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"34 g"</span>,
                     <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"1 g"</span>,
                     <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"0.1 g"</span>
                 }
             },
             <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Lays Classic Salted Chips per 100g"</span>
         },
         {
             <span class="hljs-attr">"data"</span>: {
                 <span class="hljs-attr">"Nutritional Content"</span>: {
                     <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"10.6 g"</span>,
                     <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"10.6 g"</span>,
                     <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"0 mg"</span>,
                     <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"42 kcal"</span>,
                     <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"11 mg"</span>,
                     <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"10.6 g"</span>,
                     <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"0 g"</span>
                 }
             },
             <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Coca-Cola per 100ml"</span>
         },
         {
             <span class="hljs-attr">"data"</span>: {
                 <span class="hljs-attr">"Nutritional Content"</span>: {
                     <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"2.1 g"</span>,
                     <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"60 g"</span>,
                     <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"0 mg"</span>,
                     <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"445 kcal"</span>,
                     <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"9 g"</span>,
                     <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"8.1 g"</span>,
                     <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"880 mg"</span>,
                     <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"17 g"</span>,
                     <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"3 g"</span>,
                     <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"0.1 g"</span>
                 }
             },
             <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Maggi 2-Minute Noodles per 100g"</span>
         },
         {
             <span class="hljs-attr">"data"</span>: {
                 <span class="hljs-attr">"Nutritional Content"</span>: {
                     <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"215 mg"</span>,
                     <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"717 kcal"</span>,
                     <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"0.9 g"</span>,
                     <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"51 g"</span>,
                     <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"714 mg"</span>,
                     <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"81 g"</span>,
                     <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"0 g"</span>,
                     <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"3 g"</span>
                 }
             },
             <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Amul Butter per 100g"</span>
         }
     ]
 }
</code></pre>
</li>
<li><p>Now, we can ask questions using the <code>/query</code> endpoint as shown below, with all chat history stored and tracked for context awareness:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"query"</span>: <span class="hljs-string">"What should I take for protien?"</span>
 }
</code></pre>
</li>
</ol>
<h2 id="heading-a-conversation-session-example">A conversation session example</h2>
<p><strong>Q1— What should I take for protien?</strong></p>
<blockquote>
<p>Here are the protein contents of the given foods:</p>
<ol>
<li><p>Cadbury Dairy Milk Silk Oreo - 6.1 g</p>
</li>
<li><p>Lays Classic Salted Chips - 7 g</p>
</li>
<li><p>Coca-Cola - 0 g</p>
</li>
<li><p>Maggi 2-Minute Noodles - 9 g</p>
</li>
<li><p>Amul Butter - 0.9 g</p>
</li>
</ol>
<p>To meet your protein needs, consider options like Maggi 2-Minute Noodles, Lays Classic Salted Chips, or Cadbury Dairy Milk Silk Oreo. However, please keep in mind your hypertension and diabetes; I will provide more tailored suggestions if needed.</p>
</blockquote>
<p><strong>Q2— Is it okay to have it in terms of fat content?</strong></p>
<blockquote>
<p>Considering your hypertension and diabetes, let's evaluate the fat content of the high-protein options:</p>
<ol>
<li><p>Maggi2-Minute Noodles - 17 g (Total Fat), 8.1 g (Saturated Fat)</p>
</li>
<li><p>Lays Classic Salted Chips - 34 g (Total Fat), 3.3 g (Saturated Fat)</p>
</li>
<li><p>Cadbury Dairy Milk Silk Oreo - 34.85 g (Total Fat), 21.15 g (Saturated Fat)</p>
</li>
</ol>
<p>To manage your condition, it's essential to balance your fat intake. Lays Classic Salted Chips and Cadbury Dairy Milk Silk Oreo have high fat content. Maggi2-Minute Noodles seem to be a relatively better option, but its sodium content (880 mg) is a concern for hypertension.</p>
<p>Considering both protein and fat content, Maggi2-Minute Noodles might be a compromise, but please be mindful of its sodium and overall carbohydrate content (60 g) that may impact your diabetes.</p>
</blockquote>
<p><strong>Q3— How much should I consume your recommendation and coke?</strong></p>
<blockquote>
<p>I can't recommend consuming Coca-Cola due to its high sugar content (10.6 g per 100ml), which can worsen your diabetes.</p>
<p>For Maggi2-Minute Noodles, considering your hypertension, diabetes, and protein needs, I suggest a moderate serving size. A 100g serving of Maggi2-Minute Noodles provides 9g of protein, 17g of fat, 60g of carbohydrates, and 880mg of sodium.</p>
<p>To balance your nutrient intake, limit your serving size to 50-70g (approximately half to two-thirds of a standard packet), and consider adding vegetables or protein sources with lower sodium and fat content to make it a more balanced meal.</p>
<p>Please consult a healthcare professional or registered dietitian for personalized dietary advice.</p>
</blockquote>
<p><strong>Q4— BTW which one is the least harmful?</strong></p>
<blockquote>
<p>Considering your hypertension and diabetes, I would suggest Lays Classic Salted Chips as the least harmful option among the ones listed.</p>
<p>It provides 7g of protein and has relatively lower sodium (525mg) compared to Maggi2-Minute Noodles. However, please be mindful of its high fat content (34g) and limit your serving size.</p>
<p>Keep in mind that it's essential to balance your diet with a variety of nutrient-dense foods and consult a healthcare professional for personalized advice.</p>
</blockquote>
<h2 id="heading-future-prospectshttpstuhinduttagithubionutriguidefuture-prospect"><a target="_blank" href="https://tuhindutta.github.io/nutriguide/future-prospect">Future Prospects</a></h2>
]]></content:encoded></item><item><title><![CDATA[Testing Smart Nutrition API]]></title><description><![CDATA[Introduction
A smart nutrition API that fetches the nutritional information from pictures of labelled products. As we will see next, the API returns somewhat close nutritional metrics compared to the actual labels at around 3.0% error rate.
The techn...]]></description><link>https://techtrail.tuhindutta.com/testing-smart-nutrition-api</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/testing-smart-nutrition-api</guid><category><![CDATA[APIs]]></category><category><![CDATA[Flask Framework]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[AI]]></category><category><![CDATA[llm]]></category><category><![CDATA[gemini]]></category><category><![CDATA[Nutrition]]></category><category><![CDATA[analytics]]></category><category><![CDATA[Governance]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Tue, 26 Aug 2025 18:53:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/kcRFW-Hje8Y/upload/dc9cd0d2bbe2b0f1835c63120cf225e1.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>A smart nutrition API that fetches the nutritional information from pictures of labelled products. As we will see next, the API returns somewhat close nutritional metrics compared to the actual labels at around <strong>3.0%</strong> error rate.</p>
<p>The technical details are mentioned in the <a target="_blank" href="https://tuhindutta.github.io/smart-nutrition-api/">GitHub</a> documentation and with the docker image and instructions present in the <a target="_blank" href="https://hub.docker.com/r/tkdutta/smart-nutrition-api">DockerHub</a>. There is also a comprehensive and experimental <a target="_blank" href="https://techtrail.tuhindutta.com/minikube-cluster-setup">guide</a> to setup Kubernetes cluster and deploy the API container in a pod.</p>
<p>Following are minimalistic examples of <code>development.yaml</code> and <code>service.yaml</code> for K8s deployment:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># development.yaml - fill the following required API keys as env variable.</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nutrition-api</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">nutrition-api</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">metadata:</span>
      <span class="hljs-attr">labels:</span>
        <span class="hljs-attr">app:</span> <span class="hljs-string">nutrition-api</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">nutrition-api</span>
        <span class="hljs-attr">image:</span> <span class="hljs-string">tkdutta/smart-nutrition-api:v1</span>
        <span class="hljs-attr">ports:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">5000</span>
        <span class="hljs-attr">env:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GOOGLE_API_KEY</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">""</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">GROQ_API_KEY</span>
          <span class="hljs-attr">value:</span> <span class="hljs-string">""</span>
</code></pre>
<pre><code class="lang-yaml"><span class="hljs-comment"># service.yaml</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Service</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nutrition-api-service</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">type:</span> <span class="hljs-string">NodePort</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">nutrition-api</span>
  <span class="hljs-attr">ports:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">protocol:</span> <span class="hljs-string">TCP</span>
      <span class="hljs-attr">port:</span> <span class="hljs-number">80</span>
      <span class="hljs-attr">targetPort:</span> <span class="hljs-number">8000</span>
      <span class="hljs-attr">nodePort:</span> <span class="hljs-number">30007</span>
</code></pre>
<h2 id="heading-short-description-of-the-endpoints">Short description of the endpoints</h2>
<p>Here is a simple refresher of the endpoints of the API (with more details is the baove mentioned documentations):</p>
<ol>
<li><p><code>/product-name</code>: Extract product name from an uploaded image.</p>
<ul>
<li><p><strong>Payload</strong>: Form-data with key “product” as the image file.</p>
</li>
<li><p><strong>Returns</strong>: <code>json</code> file with the extracted product name.</p>
</li>
</ul>
</li>
<li><p><code>/nutrition</code>: Get nutritional information for a given product name. Returns a <code>json</code> file.</p>
<ul>
<li><p><strong>Payload</strong>: Value from the <code>/product-name</code> output.</p>
</li>
<li><p><strong>Returns</strong>: <code>json</code> file with it’s nutritional values.</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-using-the-api">Using the API</h2>
<ol>
<li><p>Passing the following image in the <code>/product-name</code> endpoint.</p>
<p> <a target="_blank" href="https://www.amazon.in/Cadbury-Dairy-Milk-Silk-Oreo/dp/B01N3B4HGP/ref=sr_1_2_f3_0g_fs_sspa?crid=1J0V0IGL35HNM&amp;dib=eyJ2IjoiMSJ9.7CJJZeospqGY2g7TzwfcuwRAcw4FVyo37CMFg06YOp2S2YXo0C633PtLwyLnBau5Jy4tkHY-gdfIQCZXEzgXwpqfYZeIi0qu6_MTT3f2TYKidWavHWGRexPnFx6mpBk3JmWdeTaVM3igdATyMvvquxK5nGhgwbctcvXwD6QJ407PsasV46HFeha04oheY9jp0tMXG7-QBX2GlLzJmQ67USW21CNUe9zMTdQn7E86frax3E2WMgaRBx_QMZ5OxCOhBtiheQqd0ck9hd8zrB5yt_OLZJZDRskPsrHzgM6CoYE.6VtVkBve5gfk3AEO0mvDjlxUvG3eICxY4jQ_RYBU0oE&amp;dib_tag=se&amp;keywords=dairy%2Bmilk&amp;qid=1756231912&amp;s=grocery&amp;sprefix=dairy%2Bmilk%2Cgrocery%2C212&amp;sr=1-2-spons&amp;sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&amp;th=1"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756232210996/a9fa7105-7efe-4855-a43b-4c23c51cbd66.png" alt="Source: Amazon (Cadbury Dairy Milk Silk Oreo Chocolate Bars, 58.5 g)" class="image--center mx-auto" /></a></p>
</li>
<li><p>We get the following <code>json</code> output based on the above image provided:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"name"</span>: <span class="hljs-string">"dairymilk silk oreo"</span>
 }
</code></pre>
</li>
<li><p>Passing the above <em>product name</em> in the <code>/nutrition</code> endpoint to get the following output:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"data"</span>: {
         <span class="hljs-attr">"Nutritional Content"</span>: {
             <span class="hljs-attr">"Added Sugars"</span>: <span class="hljs-string">"40.5 g"</span>,
             <span class="hljs-attr">"Carbohydrates"</span>: <span class="hljs-string">"56.5 g"</span>,
             <span class="hljs-attr">"Cholesterol"</span>: <span class="hljs-string">"13.7 mg"</span>,
             <span class="hljs-attr">"Energy"</span>: <span class="hljs-string">"559 kcal"</span>,
             <span class="hljs-attr">"Protein"</span>: <span class="hljs-string">"6.1 g"</span>,
             <span class="hljs-attr">"Saturated Fat"</span>: <span class="hljs-string">"21.15 g"</span>,
             <span class="hljs-attr">"Sodium"</span>: <span class="hljs-string">"212 mg"</span>,
             <span class="hljs-attr">"Total Fat"</span>: <span class="hljs-string">"34.85 g"</span>,
             <span class="hljs-attr">"Total Sugars"</span>: <span class="hljs-string">"48.6 g"</span>,
             <span class="hljs-attr">"Trans Fat"</span>: <span class="hljs-string">"0.15 g"</span>
         }
     },
     <span class="hljs-attr">"heading"</span>: <span class="hljs-string">"Nutritional Content of Cadbury Dairy Milk Silk Oreo per 100g"</span>
 }
</code></pre>
</li>
</ol>
<h2 id="heading-validating-the-output">Validating the output</h2>
<p>Let’s validate the output with the actual label in the product:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1756232372436/35e8bcdd-9444-4d8d-91f5-762a1bc94734.png" alt class="image--center mx-auto" /></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Nutritional contents</td><td>API output (per 100 g)</td><td>Label (per 100 g)</td><td>Error %</td></tr>
</thead>
<tbody>
<tr>
<td>Energy</td><td>559.0 kcal</td><td>563.0 kcal</td><td>0.7 %</td></tr>
<tr>
<td>Carbohydrates</td><td>56.5 g</td><td>56.9 g</td><td>0.7 %</td></tr>
<tr>
<td>Protien</td><td>6.1 g</td><td>6.0 g</td><td>-1.67 %</td></tr>
<tr>
<td>Sugars</td><td>48.6 g</td><td>48.9 g</td><td>0.6 %</td></tr>
<tr>
<td>Total Fat</td><td>34.85 g</td><td>34.8 g</td><td>-0.14 %</td></tr>
<tr>
<td>Saturated Fat</td><td>21.15 g</td><td>22.3 g</td><td>0.67 %</td></tr>
<tr>
<td>Sodium</td><td>212.0 mg</td><td>207.0 mg</td><td>-2.42 %</td></tr>
<tr>
<td>Cholesterol</td><td>13.7 mg</td><td>17.4 mg</td><td>21.26 %</td></tr>
</tbody>
</table>
</div><p>Therefore, the <strong>Mean Absolute Percentage Error (MAPE)</strong> of the above is <strong>~3.8%</strong></p>
<p>Performing the same analysis for multiple products across various brands give an average error rate of <strong>~3.0%</strong></p>
<h2 id="heading-governance-trust-amp-disclaimer">Governance, Trust &amp; Disclaimer</h2>
<ol>
<li><p>Nutrition data is approximate and may differ from actual food content. Use as guidance, not as medical advice.</p>
</li>
<li><p><strong>Not a medical-grade app</strong>. For general use only and is not to be considered as ultimate source of truth without proper research.</p>
</li>
<li><p>Data is extracted through general internet scraping.</p>
</li>
<li><p>To be treated like a <strong>navigation app</strong> - great for guidance, but <strong>not a substitute for a doctor’s prescription or lab-tested food assay</strong>.</p>
</li>
</ol>
<h2 id="heading-future-prospectshttpstuhinduttagithubionutriguidefuture-prospect"><a target="_blank" href="https://tuhindutta.github.io/nutriguide/future-prospect">Future Prospects</a></h2>
]]></content:encoded></item><item><title><![CDATA[Minikube Cluster Setup]]></title><description><![CDATA[The following guide specifically focusses on the setup of Linux (Ubuntu Server) guest in Windows 11 host.


Install and configure Oracle VirtualBox.
 

Donload Ubuntu Server ISO.
 

Install the Ubuntu Server ISO in VirtualBox with the following syste...]]></description><link>https://techtrail.tuhindutta.com/minikube-cluster-setup</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/minikube-cluster-setup</guid><category><![CDATA[Devops]]></category><category><![CDATA[mlops]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[containers]]></category><category><![CDATA[Docker]]></category><category><![CDATA[data-engineering]]></category><category><![CDATA[minikube]]></category><category><![CDATA[VirtualBox ]]></category><category><![CDATA[virtual machine]]></category><category><![CDATA[k8s]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Sun, 27 Jul 2025 19:11:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ZfVyuV8l7WU/upload/547a3e145570f1c850c40dc375e2f6dd.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>The following guide specifically focusses on the setup of Linux (Ubuntu Server) guest in Windows 11 host.</p>
</blockquote>
<ol>
<li><p>Install and configure <a target="_blank" href="https://www.virtualbox.org/wiki/Downloads">Oracle VirtualBox</a>.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753639165620/6df1ff4c-7166-4e02-b3ce-61ccc2725f6c.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Donload <a target="_blank" href="https://ubuntu.com/download/server">Ubuntu Server ISO</a>.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753639222443/f0e16f7b-b57d-40b8-bb8a-2e2a5c64be53.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Install the Ubuntu Server ISO in VirtualBox with the following system requirements of the VDI created:</p>
<ul>
<li><p>Minimum Disk space: Minimum 25 GB</p>
</li>
<li><p>Minimum Memory: 4 GB</p>
</li>
<li><p>Minimum number of CPU cores: 2</p>
</li>
<li><p>Network: <strong><em>NAT</em></strong> / <strong><em>Bridged Adapter</em></strong> (Recommended)</p>
</li>
</ul>
</li>
<li><p>For <strong><em>NAT</em></strong> netwrok:</p>
<p> Click on ‘<strong><em>Port Forwarding</em></strong>’ button and configure with the following details. This is required to perform <code>ssh</code> from Windows terminal.</p>
</li>
</ol>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Name</td><td>Protocol</td><td>Host IP</td><td>Host Port</td><td>Guest IP</td><td>Guest Port</td></tr>
</thead>
<tbody>
<tr>
<td>ssh</td><td>TCP</td><td></td><td>3022</td><td></td><td>22</td></tr>
</tbody>
</table>
</div><p>For <strong><em>Bridged Adapter</em></strong> network, find the IP using the following command in the guest machine.</p>
<pre><code class="lang-bash">ip a
</code></pre>
<ol start="5">
<li><p>Open and log into the VDI and run the following commands:</p>
<pre><code class="lang-bash"> <span class="hljs-comment"># Update the OS</span>
 sudo apt update &amp;&amp; sudo apt upgrade

 <span class="hljs-comment"># Install openssh-server</span>
 sudo apt install openssh-server

 <span class="hljs-comment"># Enable the ssh service using systemctl</span>
 sudo systemctl <span class="hljs-built_in">enable</span> ssh --now

 <span class="hljs-comment"># Check if the service is enabled</span>
 sudo systemctl status ssh
</code></pre>
<p> If the ssh service is enabled, following message is displayed:</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753640726189/e56411b8-35c2-45fd-9d3d-dfb34b983e6f.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Now we can perform <code>ssh</code> from the host terminal using the following command and use the VDI with a somewhat an experience mimicking the cloud:</p>
<pre><code class="lang-bash"> <span class="hljs-comment"># NAT network</span>
 ssh -p 3022 vboxuser@127.0.0.1

 <span class="hljs-comment"># Bridged Adapter network</span>
 ssh vboxuser@192.xxx.x.x
</code></pre>
</li>
<li><p>Now, perform the following steps / execute commands to install <code>docker</code> and <code>minikube</code>. All the steps are mentioned in this <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/?arch=%2Fwindows%2Fx86-64%2Fstable%2F.exe+download">documentation of minikube</a>. Following is the simpler explanation of the mentioned steps.</p>
</li>
</ol>
<h3 id="heading-docker-installation">Docker Installation</h3>
<p>A container or virtual machine manager is required. In this, we will be using <strong>docker</strong>. We will refer to the <a target="_blank" href="https://docs.docker.com/engine/install/ubuntu/">docker documentation</a> for the docker installation. Following are the steps:</p>
<ol>
<li><p>Uninstall the unofficial packages:</p>
<pre><code class="lang-bash"> <span class="hljs-keyword">for</span> pkg <span class="hljs-keyword">in</span> docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; <span class="hljs-keyword">do</span> sudo apt-get remove <span class="hljs-variable">$pkg</span>; <span class="hljs-keyword">done</span>
</code></pre>
</li>
<li><p>Install using <code>apt-repository</code>:</p>
<pre><code class="lang-bash"> <span class="hljs-comment"># Add Docker's official GPG key:</span>
 sudo apt-get update
 sudo apt-get install ca-certificates curl
 sudo install -m 0755 -d /etc/apt/keyrings
 sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
 sudo chmod a+r /etc/apt/keyrings/docker.asc

 <span class="hljs-comment"># Add the repository to Apt sources:</span>
 <span class="hljs-built_in">echo</span> \
   <span class="hljs-string">"deb [arch=<span class="hljs-subst">$(dpkg --print-architecture)</span> signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
   <span class="hljs-subst">$(. /etc/os-release &amp;&amp; echo <span class="hljs-string">"<span class="hljs-variable">${UBUNTU_CODENAME:-<span class="hljs-variable">$VERSION_CODENAME</span>}</span>"</span>)</span> stable"</span> | \
   sudo tee /etc/apt/sources.list.d/docker.list &gt; /dev/null
 sudo apt-get update
</code></pre>
</li>
<li><p>Install the docker dependencies:</p>
<pre><code class="lang-bash"> sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
</code></pre>
</li>
<li><p>Verify the installation:</p>
<pre><code class="lang-bash"> sudo docker run hello-world
</code></pre>
</li>
<li><p>Use the following command to add the user to docker group. This step is a prerequisite to start <code>minikube</code>:</p>
<pre><code class="lang-bash"> sudo usermod -aG docker <span class="hljs-variable">$USER</span> &amp;&amp; newgrp docker
</code></pre>
</li>
</ol>
<h3 id="heading-minikube-installation">Minikube Installation</h3>
<p>In the previously mentioned <a target="_blank" href="https://minikube.sigs.k8s.io/docs/start/?arch=%2Flinux%2Fx86-64%2Fstable%2Fbinary+download">minikube documentation</a>, in the ‘Installation’ section, select the buttons to describe the target platform. For our case, it is:</p>
<ul>
<li><p>Operating System: Linux</p>
</li>
<li><p>Architecture: x86-64</p>
</li>
<li><p>Release type: Stable</p>
</li>
<li><p>Installer type: Debian package</p>
</li>
</ul>
<p>Perform the following steps to install <code>minikube</code> server.</p>
<ol>
<li><p>The reuqired command to install the <code>minikube</code> package will appear below the ‘Installation’ section after selecting the relevant buttons in the sectioned mentioned above which is to be run in the terminal:</p>
<pre><code class="lang-bash"> curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube_latest_amd64.deb
 sudo dpkg -i minikube_latest_amd64.deb
</code></pre>
</li>
<li><p>Start the cluster</p>
<pre><code class="lang-bash"> minikube start
</code></pre>
</li>
<li><p>Install <code>kubectl</code>:</p>
<pre><code class="lang-bash"> sudo snap install kubectl --classic
</code></pre>
</li>
</ol>
<p>Now we are ready with development ready server of Kubernetes which can be used for hosting, serving and testing small projects.</p>
]]></content:encoded></item><item><title><![CDATA[Speech Refiner]]></title><description><![CDATA[In the digital age, the way we communicate in writing matters more than ever. Whether it’s an email, a feedback form or a social media comment. Tone and politeness can make or even break a message. That is why, here is a free to use solution of Speec...]]></description><link>https://techtrail.tuhindutta.com/speech-refiner</link><guid isPermaLink="true">https://techtrail.tuhindutta.com/speech-refiner</guid><category><![CDATA[LLM-powered speech refiner]]></category><category><![CDATA[language tone enhancer]]></category><category><![CDATA[Groq LLM integration]]></category><category><![CDATA[Flask AI backend]]></category><category><![CDATA[speech to text]]></category><category><![CDATA[Audio Transcription]]></category><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><dc:creator><![CDATA[Tuhin Kumar Dutta]]></dc:creator><pubDate>Fri, 18 Jul 2025 14:51:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/2dA2zWv0A8o/upload/f4661fc12f1286cf7c548471b33f1ff5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the digital age, the way we communicate in writing matters more than ever. Whether it’s an email, a feedback form or a social media comment. Tone and politeness can make or even break a message. That is why, here is a free to use solution of <strong>Speech Refiner</strong>, a lightweight AI tool designed to improved and soften any written comminication.</p>
<p>Instead of delving into the mathematics behind the scene, this post only on the practical implementation and deployment to deliver a ready to use tool.</p>
<h2 id="heading-motivation">Motivation</h2>
<p>We’ve all been there - waking up to promotional emails, automated follow-ups, or vague LinkedIn messages asking for “a quick connect.” While many of these messages border on spam, we often still need to respond, especially in professional settings without sounding rude or dismissive.</p>
<p>In day-to-day communication, whether it's replying to persistent sales emails, addressing unsolicited collaboration requests, or giving constructive feedback to a colleague, tone plays a crucial role. A blunt or emotionally charged message can harm relationships or escalate misunderstandings.</p>
<p>That’s where the idea for <strong>Speech Refiner</strong> came from:</p>
<blockquote>
<p>A tool that helps you express what you mean - firmly if needed but in a polite, professional tone.</p>
</blockquote>
<p>In a world driven by remote communication and asynchronous messaging, being clear without being curt is not just good etiquette but it’s a skill. Speech Refiner aims to make that easier, one sentence at a time.</p>
<blockquote>
<p><strong>Q: Why not just use ChatGPT?</strong><br />A: This tool offers a high degree of customization and full control to the user. It follows a plug-and-play architecture, allowing users to seamlessly integrate any local or preferred language models into the backend as per their requirements.</p>
</blockquote>
<h2 id="heading-project-overview">Project Overview</h2>
<p>The system is composed of two parts:</p>
<ul>
<li><p><strong>Frontend</strong>: An n intuitive interface where users can <strong>either record their voice</strong> or <strong>upload a pre-recorded audio clip</strong> containing a message they wish to refine. <em><mark>(</mark></em><a target="_blank" href="https://tuhindutta.github.io/Speech-Refiner/"><em><mark>GitHub documentation</mark></em></a><em><mark>)</mark></em></p>
</li>
<li><p><strong>Backend</strong>: A LLM engine powered backend custom API that receives recordings from the frontend and rewrites it to deliver with softened tone, courtesy phrases, and contextual phrasing. <em><mark>(</mark></em><a target="_blank" href="https://tuhindutta.github.io/Politeness-Engine-API-Backend/#-endpoint-specification"><em><mark>GitHub documentation</mark></em></a><em><mark>)</mark></em></p>
</li>
</ul>
<p>Whether it's a voice note, a spontaneous thought, or a casual remark, Speech Refiner transforms it into a professional, composed version - ready to be used in emails, meetings, or digital conversations. It bridges the gap between natural speech and formal communication, helping users convey their intent with clarity and courtesy.</p>
<h2 id="heading-application-screenshot">Application Screenshot</h2>
<p><img src="https://github.com/user-attachments/assets/82a46653-ac8c-49bd-9c26-27aafba944cc" alt="image" /></p>
<blockquote>
<p><mark>The focus will primarily be on the backend, given the AI-centric nature of our discussion, though we’ll also briefly touch on the frontend toward the end.</mark></p>
</blockquote>
<h2 id="heading-backend">Backend</h2>
<p>The backbone of the <strong>Speech Refiner</strong> application is a robust, modular, and production-ready backend service. Designed with a focus on <strong>speech-to-text transcription</strong> and <strong>language refinement</strong>, the backend plays a critical role in transforming voice inputs into polished, professional responses.</p>
<p>Let’s take a look at how it works under the hood.</p>
<blockquote>
<p><mark>The backend API GitHub repository has been kept private to prevent misuse, such as unauthorized API calls or abuse.</mark></p>
</blockquote>
<h3 id="heading-overview">Overview</h3>
<p>The <strong>Politeness Engine Backend</strong> is a RESTful API built using <strong>Flask</strong>, designed to handle audio files submitted from the frontend. It performs two key tasks:</p>
<ol>
<li><p><strong>Transcribes speech</strong> from the uploaded audio using a speech recognition engine.</p>
</li>
<li><p><strong>Refines the transcribed text</strong> using a Large Language Model (LLM) hosted on Groq.</p>
</li>
</ol>
<p>This backend is lightweight, scalable, and secure, making it ready for integration into real-world applications, whether as a web client, desktop interface, or mobile app.</p>
<h3 id="heading-system-architecture">System Architecture</h3>
<p>The core system is organized across three main files:</p>
<ul>
<li><p><code>main.py</code><br />  Acts as the entry point of the application. It defines the <code>/upload</code> endpoint, handles file uploads, manages temporary file storage, and orchestrates the processing flow.</p>
</li>
<li><p><code>utils.py</code><br />  Contains two utility classes:</p>
<ul>
<li><p><code>Transcription</code>: Converts audio input into text using the <a target="_blank" href="https://pypi.org/project/SpeechRecognition/">Google Speech Recognition API</a> via the <code>speech_recognition</code> package.</p>
</li>
<li><p><code>LLM</code>: Sends the transcribed text to a <strong>Groq-hosted LLM API</strong> with a predefined prompt, returning an enhanced, more polite version of the message.</p>
</li>
</ul>
</li>
<li><p><code>requirements.txt</code><br />  Lists all dependencies required to run the service, including Flask, CORS handling, rate limiting, and audio/LLM utilities.</p>
</li>
</ul>
<h3 id="heading-api-endpoint-upload">API Endpoint: <code>/upload</code></h3>
<p>The backend exposes a single public endpoint:</p>
<ul>
<li><p><strong>Method</strong>: <code>POST</code></p>
</li>
<li><p><strong>Route</strong>: <code>/upload</code></p>
</li>
<li><p><strong>Content-Type</strong>: <code>multipart/form-data</code></p>
</li>
<li><p><strong>Form Field</strong>: <code>audio</code> - Accepts audio files (WAV)</p>
</li>
<li><h4 id="heading-request-flow">Request Flow:</h4>
<p>  1. User uploads an audio file from the frontend interface.</p>
<p>  2. File is saved temporarily in the <code>uploads/</code> directory.</p>
<p>  3. Audio is loaded into memory using <a target="_blank" href="http://scipy.io"><code>scipy.io</code></a><code>.wavfile</code>.</p>
<p>  4. The <code>Transcription</code> module converts the audio into raw text.</p>
<p>  5. The text is truncated to 100 words to manage LLM input token limits.</p>
<p>  6. The <code>LLM</code> module sends the text to the Groq API for refinement.</p>
<p>  7. Refined output is returned alongside the original transcription.</p>
<p>  8. The temporary file is deleted to maintain a clean and secure environment.</p>
</li>
</ul>
<h3 id="heading-successful-response-200-ok">Successful Response (200 OK)</h3>
<pre><code class="lang-json">{
  <span class="hljs-attr">"input"</span>: <span class="hljs-string">"Raw transcribed text"</span>,
  <span class="hljs-attr">"output"</span>: <span class="hljs-string">"Refined and polite version of the message"</span>
}
</code></pre>
<h3 id="heading-error-handling">Error Handling</h3>
<ul>
<li><p>If no audio file is provided, a <code>400 Bad Request</code> is returned with a helpful error message.</p>
</li>
<li><p>In case of unexpected issues (e.g., invalid formats, API failures), a <code>500 Internal Server Error</code> is returned with the error trace for debugging.</p>
</li>
</ul>
<h3 id="heading-rate-limiting-amp-security">Rate Limiting &amp; Security</h3>
<p>To prevent abuse and ensure fair usage, the backend implements <strong>IP-based rate limiting</strong> using <code>Flask-Limiter</code>. Limits are:</p>
<ul>
<li><p><strong>10 requests per minute</strong></p>
</li>
<li><p><strong>150 requests per day</strong></p>
</li>
</ul>
<p>Rate limits are stored in <strong>Redis</strong>, managed through:</p>
<ul>
<li><p><strong>Redis Hosting</strong>: <a target="_blank" href="https://upstash.com/">Upstash</a></p>
</li>
<li><p><strong>Configuration</strong>: Passed via <code>REDIS_URI</code> environment variable</p>
</li>
<li><p><strong>Fallback</strong>: Defaults to in-memory storage if Redis is unavailable</p>
</li>
</ul>
<h3 id="heading-llm-integration-groq-api">LLM Integration (Groq API)</h3>
<p>The refinement is powered by a <strong>Groq-hosted Large Language Model</strong> (<code>llama-3.3-70b-versatile</code>):</p>
<ul>
<li><p>Transcribed text is passed to the LLM with a carefully crafted prompt.</p>
</li>
<li><p>The <code>LLM.query_llm()</code> method sends this data via a secure API request.</p>
</li>
<li><p>The API key (<code>GROQ_API_KEY</code>) is stored as an environment variable and never exposed to the client.</p>
</li>
</ul>
<p>This ensures that the entire language processing logic remains server-side and protected.</p>
<h3 id="heading-deployment-overview">Deployment Overview</h3>
<p>The backend is deployed on <strong>Render</strong>, a modern cloud platform for hosting APIs.</p>
<ul>
<li><p><strong>Server Stack</strong>: Flask app served via <code>Gunicorn</code>, a production-ready WSGI HTTP server.</p>
</li>
<li><p><strong>Environment Variables</strong>:</p>
<ul>
<li><p><code>GROQ_API_KEY</code> – LLM API key</p>
</li>
<li><p><code>REDIS_URI</code> – Redis connection string (via Upstash)</p>
</li>
</ul>
</li>
</ul>
<p>Uploaded audio files are <strong>never persisted</strong> long-term—they’re deleted immediately after processing.</p>
<h3 id="heading-key-dependencies">Key Dependencies</h3>
<p>The application uses the following Python libraries:</p>
<ul>
<li><p><code>Flask</code>, <code>flask-cors</code> – REST API and CORS support</p>
</li>
<li><p><code>Flask-Limiter</code>, <code>redis</code> – Rate limiting infrastructure</p>
</li>
<li><p><code>speechrecognition</code>, <code>scipy</code>, <code>soundfile</code>, <code>numpy</code> – Audio handling and transcription</p>
</li>
<li><p><code>requests</code> – Communicating with the Groq API</p>
</li>
<li><p><code>gunicorn</code> – Serving the app in production</p>
</li>
</ul>
<h2 id="heading-frontend">Frontend</h2>
<blockquote>
<p><a target="_blank" href="https://github.com/tuhindutta/Speech-Refiner"><mark>GitHub Repository</mark></a></p>
</blockquote>
<p>The <strong>Speech Refiner</strong> frontend acts as a clean and intuitive interface for interacting with the backend AI engine. It provides two primary modes of input:</p>
<ul>
<li><p><strong>Live voice recording</strong> using the browser microphone</p>
</li>
<li><p><strong>Audio file upload</strong> (WAVformat)</p>
</li>
</ul>
<p>Once an audio input is submitted, it is sent to the <code>/upload</code> endpoint of the backend. Upon successful processing, users receive the <strong>original transcription</strong> and the <strong>refined, polite version</strong> of their message, rendered instantly within the interface.</p>
<h3 id="heading-tech-stack">Tech Stack</h3>
<ul>
<li><p><strong>HTML + Vanilla JavaScript</strong>: Lightweight and dependency-free</p>
</li>
<li><p><strong>Web Audio API &amp; MediaRecorder</strong>: For live voice recording</p>
</li>
<li><p><strong>Fetch API</strong>: Handles communication with the Flask backend</p>
</li>
</ul>
<h3 id="heading-packaging-for-distribution">Packaging for Distribution</h3>
<ul>
<li><p>To run the app, run:</p>
</li>
<li><p>To build the <code>.exe</code>:</p>
<pre><code class="lang-javascript">  npx electron .
  npm install
  npx electron-packager . SpeechRefiner --platform=win32 --arch=x64 --icon=favicon.ico --overwrite
</code></pre>
<blockquote>
<p>This will create a packaged version of the app using Electron Packager or Electron Forge (as configured).</p>
</blockquote>
</li>
</ul>
<h3 id="heading-security-considerations">Security Considerations</h3>
<p>Since audio data is sensitive, the design ensures that <strong>no processing happens on the client side</strong>. All voice inputs are securely transmitted to the backend over HTTPS, where transcription and LLM processing take place. The backend API URL is abstracted, and no secrets or tokens are exposed on the frontend.</p>
<p>This separation of concerns ensures a secure and privacy-respecting user experience.</p>
<h3 id="heading-content-security-policy-csp-troubleshooting">Content Security Policy (CSP) Troubleshooting</h3>
<ol>
<li><p>Local API vs Hosted API Issues</p>
<ul>
<li><p>Issue: Hosted API (<code>http://192.168.x.x:5000</code>) didn’t respond as expected inside Electron.</p>
</li>
<li><p>Cause: Hosted API had longer response time (~5 sec) with no visual feedback.</p>
</li>
<li><p>Fixes:</p>
<ul>
<li><p>Added a <code>Processing...</code> loader during API call.</p>
</li>
<li><p>Verified API response behavior using Postman/browser.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>Unsupported Audio Format (WebM instead of WAV)</p>
<ul>
<li><p>Error:</p>
<pre><code class="lang-bash">  File format b<span class="hljs-string">'\x1aE\xdf\xa3'</span> not understood. Only <span class="hljs-string">'RIFF'</span> and <span class="hljs-string">'RIFX'</span> supported.
</code></pre>
</li>
<li><p>Cause: MediaRecorder API defaulted to WebM; Flask expected <code>.wav</code>.</p>
</li>
<li><p>Fix: Switched to <code>recorder.js</code> which generates proper <code>.wav</code> output.</p>
</li>
</ul>
</li>
<li><p>Invalid WAV Header (nAvgBytesPerSec mismatch)</p>
<ul>
<li><p>Error:</p>
<pre><code class="lang-bash">  WAV header is invalid: nAvgBytesPerSec must equal product of nSamplesPerSec and nBlockAlign
</code></pre>
</li>
<li><p>Cause: Some versions of Recorder.js generated incorrect headers.</p>
</li>
<li><p>Fixes:</p>
<ul>
<li><p>CDN failed due to MIME issues.</p>
</li>
<li><p>Forked versions had broken links.</p>
</li>
<li><p>Manually downloaded corrected <code>recorder.js</code> from GitHub and loaded it locally.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>MIME Type Execution Errors</p>
<ul>
<li><p>Error:</p>
<pre><code class="lang-bash">  Refused to execute script from CDN because its MIME <span class="hljs-built_in">type</span> was <span class="hljs-string">'text/plain'</span>
</code></pre>
</li>
<li><p>Fix: Used local version of recorder.js:</p>
<pre><code class="lang-xml">  <span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"recorder.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
</li>
</ul>
</li>
<li><p>Electron Warning</p>
<ul>
<li><p>Message:</p>
<pre><code class="lang-bash">  Insecure Content-Security-Policy: no CSP or unsafe-eval used
</code></pre>
</li>
<li><p>Strict CSP Attempt</p>
<pre><code class="lang-xml">  <span class="hljs-tag">&lt;<span class="hljs-name">meta</span> <span class="hljs-attr">http-equiv</span>=<span class="hljs-string">"Content-Security-Policy"</span> <span class="hljs-attr">content</span>=<span class="hljs-string">"default-src 'self'; script-src 'self'; connect-src http://192.168.x.x:5000;"</span>&gt;</span>
</code></pre>
<ul>
<li><p>Issue:</p>
<ul>
<li><p>Inline scripts blocked.</p>
</li>
<li><p>Microphone stopped.</p>
</li>
<li><p>API hit prematurely without file.</p>
</li>
</ul>
</li>
<li><p>Root Cause</p>
<ul>
<li><p>Electron apps commonly use inline scripts or libraries requiring relaxed policies.</p>
</li>
<li><p>Strict CSP blocks <code>eval</code>, inline JavaScript, dynamic execution.</p>
</li>
</ul>
</li>
<li><p>Solutions Attempted</p>
<ul>
<li><p>Tried relaxed CSP with:</p>
<pre><code class="lang-javascript">  script-src <span class="hljs-string">'self'</span> <span class="hljs-string">'unsafe-inline'</span>
</code></pre>
</li>
<li><p>Inline scripts worked, but reintroduced security risks (e.g., XSS).</p>
</li>
</ul>
</li>
<li><p>Final Decision</p>
<ul>
<li><p>CSP not applied now due to dev-time constraints.</p>
</li>
<li><p>Plan:</p>
<ul>
<li><p>Keep .exe private.</p>
</li>
<li><p>Share code with API placeholder.</p>
</li>
<li><p>Let users build locally and request API key if needed.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<blockquote>
<h5 id="heading-recorderjs-is-downloaded-from-herehttpsrawgithubusercontentcomsophisterrecorderjs-exmasterdistrecorderjs"><code>recorder.js</code> is downloaded from <a target="_blank" href="https://raw.githubusercontent.com/sophister/recorderjs-ex/master/dist/recorder.js"><mark>here</mark></a>.</h5>
</blockquote>
<h3 id="heading-recommendations-for-future-deployment"><strong>Recommendations for Future Deployment</strong></h3>
<ul>
<li><p>Extract all inline scripts into external files.</p>
</li>
<li><p>Set a strict and secure CSP header.</p>
</li>
<li><p>Remove unsafe-inline and unsafe-eval.</p>
</li>
<li><p>Validate microphone permissions and backend headers for production use.</p>
</li>
</ul>
]]></content:encoded></item></channel></rss>