{"id":3910,"date":"2025-06-23T16:25:41","date_gmt":"2025-06-23T14:25:41","guid":{"rendered":"https:\/\/exceltic.com\/?p=3910"},"modified":"2026-03-19T13:28:24","modified_gmt":"2026-03-19T12:28:24","slug":"generative-ia-with-full-control-small-slms-language-models-and-quantisation-2","status":"publish","type":"post","link":"https:\/\/exceltic.serquo.com\/en\/generative-ia-with-full-control-small-slms-language-models-and-quantisation-2\/","title":{"rendered":"Generative AI with Total Control: Small Language Models (SLMs) and Quantization"},"content":{"rendered":"<p class=\"has-medium-font-size\">Generation AI has changed the way we interact with technology, but its widespread use presents challenges in terms of privacy, governance and resource efficiency. In this article we examine how Small Language Models (SLMs) offer a more controlled and efficient alternative to Large Language Models (LLMs) and how model quantification can further improve their performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-tertiary-color has-text-color has-link-color has-large-font-size wp-elements-e76b0c190fbc11e2b681060fe0464846\">What are SLMs?<\/h2>\n\n\n\n<p class=\"has-medium-font-size\">Small Language Models (SLMs) are trained language models with a similar architecture to LLMs but with a much smaller number of parameters for natural language processing, understanding and content generation.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">These are the advantages of the varieties:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Low resource consumption (RAM\/CPU\/GPU)<\/strong>They require less processing power, which makes them easier to implement on more affordable hardware.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Greater control<\/strong>can be used both in local and private environments, which <\/li>\n\n\n\n<li class=\"has-medium-font-size\">ensures greater security and governance.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Increased inferential speed<\/strong>The answers are generated faster and more efficiently as there are fewer parameters.<\/li>\n<\/ul>\n\n\n\n<p class=\"has-medium-font-size\">Examples of known SLMs: <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Mistral 7B<\/strong><\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Phi-2<\/strong> from Microsoft (~2.7B parameters)<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>TinyLLaMA<\/strong><\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Gemma 2B<\/strong> of Google<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Model Quantization: Downsizing without Precise Mortality <\/h2>\n\n\n\n<p class=\"has-medium-font-size\">One of the main obstacles to training AI models is their size and the processing power required. Here, quantization plays a key role. This technique decreases model size by converting high-precision weights (FP32, FP16) to low-precision weights (INT8, INT4) without significantly affecting model performance. Advantages include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Reduced memory usage<\/strong>Allows models to be stored and managed on devices with limited capacity.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Increased GPU\/CPU efficiency:<\/strong> Reduces CPU load by speeding up mathematical operations.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Accelerated inference:<\/strong> Models can react more quickly, decreasing the accuracy of calculations.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Comparison of SLM and LLM<\/h2>\n\n\n\n<p class=\"has-medium-font-size\">While LLM has proven to be a powerful tool, SLM offers important advantages in situations where efficiency and privacy are paramount. Below, we compare the two approaches:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"741\" height=\"426\" src=\"https:\/\/exceltic.com\/wp-content\/uploads\/2025\/06\/tabla-comparacion-SLM-y-LLM.png\" alt=\"\" class=\"wp-image-3911\" srcset=\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/tabla-comparacion-SLM-y-LLM.png 741w, https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/tabla-comparacion-SLM-y-LLM-300x172.png 300w, https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/tabla-comparacion-SLM-y-LLM-18x10.png 18w\" sizes=\"auto, (max-width: 741px) 100vw, 741px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Recovery Augmented Generation (RAG)<\/h2>\n\n\n\n<p class=\"has-medium-font-size\">The Retrieval Augmented Generation (RAG) technique is used to improve the accuracy and contextualisation of models. This method maximises responses by obtaining information from additional sources and improves the context prior to text generation. Its structure consists of:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Chunking<\/strong>fragmentation of data into manageable parts.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Embeddings of documents<\/strong>: conversion of text into numeric vector.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Vector database (VectorDB)<\/strong>: <strong>)<\/strong>is a database that stores and retrieves relevant data.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Information retrieval<\/strong>When faced with a query, it locates the most relevant pieces of information.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Generating responses<\/strong>: synthesis of contextualised information to improve the output of the model.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Implementing SLMs + RAG: An Efficient and Secure Model<\/h2>\n\n\n\n<p class=\"has-medium-font-size\">The combination of SLMs with the RAG strategy enables the creation of highly efficient and controllable generative AI systems. With this architecture, organisations can use optimised models that ensure greater privacy while using fewer resources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">Key Benefits:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\">&nbsp;<strong>Optimised use of data<\/strong>The inclusion of information retrieval allows for more accurate and informed responses.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Full control over the model:<\/strong> avoids the need for third-party services and allows customisation of AI behaviour.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Execution in restricted environments:<\/strong> Due to their quantifiability and smaller size, SLMs can be deployed on edge devices or local servers.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Basic Architecture with Quantified Model, LangChain, RAG and FastAPI<\/h2>\n\n\n\n<p class=\"has-medium-font-size\">The following architecture can be used to create an efficient SML environment:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>Loading of the quantized model:<\/strong> a model that has been previously quantified in INT8 or INT4 is used to optimise its performance.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>LangChain for prompts management<\/strong>LangChain: LangChain allows to structure and extend the requests of the model.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Use of RAG for enhanced recovery<\/strong>a: makes use of vector databases to improve the context of responses.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"> <strong>REST API with FastAPI:<\/strong> this model explains how to use an API to facilitate integration with other applications.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">Example code:<\/h3>\n\n\n\n<p class=\"has-medium-font-size\">from fastapi import FastAPI, HTTPException<\/p>\n\n\n\n<p class=\"has-medium-font-size\">from langchain.chains import RetrievalQA<\/p>\n\n\n\n<p class=\"has-medium-font-size\">from langchain.vectorstores import FAISS<\/p>\n\n\n\n<p class=\"has-medium-font-size\">from langchain.embeddings import HuggingFaceEmbeddings<\/p>\n\n\n\n<p class=\"has-medium-font-size\">from langchain.llms import HuggingFacePipeline<\/p>\n\n\n\n<p class=\"has-medium-font-size\">from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline<\/p>\n\n\n\n<p class=\"has-medium-font-size\">import torch<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-medium-font-size\"># Load quantized model<\/p>\n\n\n\n<p class=\"has-medium-font-size\">tokenizer = AutoTokenizer.from_pretrained(\"model-quantified\")<\/p>\n\n\n\n<p class=\"has-medium-font-size\">model = AutoModelForCausalLM.from_pretrained(\"model-quantified\", torch_dtype=torch.int8)<\/p>\n\n\n\n<p class=\"has-medium-font-size\">pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)<\/p>\n\n\n\n<p class=\"has-medium-font-size\">llm = HuggingFacePipeline(pipeline=pipe)<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p class=\"has-medium-font-size\"># Load Vector Database for RAG<\/p>\n\n\n\n<p class=\"has-medium-font-size\">embeddings = HuggingFaceEmbeddings(\"sentence-transformers\/all-MiniLM-L6-v2\")<\/p>\n\n\n\n<p class=\"has-medium-font-size\">db = FAISS.load_local(\"ruta_vector_db\", embeddings)<\/p>\n\n\n\n<p class=\"has-medium-font-size\">retriever = db.as_retriever()<\/p>\n\n\n\n<p class=\"has-medium-font-size\">qa_chain = RetrievalQA(llm=llm, retriever=retriever)<\/p>\n\n\n\n<p class=\"has-medium-font-size\">app = FastAPI()<\/p>\n\n\n\n<p class=\"has-medium-font-size\">@app.post(\"\/generate\")<\/p>\n\n\n\n<p class=\"has-medium-font-size\">def generate_response(prompt: str, max_length: int = 100):<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; try:<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; response = qa_chain.run(prompt)<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return {\"answer\": answer}<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; except Exception as e:<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; raise HTTPException(status_code=500, detail=str(e))<\/p>\n\n\n\n<p class=\"has-medium-font-size\">if __name__ == \"__main__\":<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; import uvicorn<\/p>\n\n\n\n<p class=\"has-medium-font-size\">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; uvicorn.run(app, host=\"0.0.0.0.0\u2033, port=8000)<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-large-font-size\">Explanation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">Loading the quantized model<\/h3>\n\n\n\n<p class=\"has-medium-font-size\"><em>tokenizer = AutoTokenizer.from_pretrained(\"model-quantified\")<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>model = AutoModelForCausalLM.from_pretrained(\"model-quantified\", torch_dtype=torch.int8)<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer)<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>llm = HuggingFacePipeline(pipeline=pipe)<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>AutoTokenizer.from_pretrained(\"model-quantified\"): <\/strong>Loads the tokeniser of the quantized model.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong> AutoModelForCausalLM.from_pretrained(\"model-quantified\", torch_dtype=torch.int8): <\/strong>Loads the quantized model in int8 precision, which reduces memory usage and speeds up inference.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>pipeline(\"text-generation\", model=model, tokenizer=tokenizer):<\/strong> Creates a text generation pipeline based on the quantified model.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>HuggingFacePipeline(pipeline=pipe):<\/strong> Integrate the pipeline into LangChain for further use in the RAG architecture.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">2. Retrieval-Augmented Generation (RAG) configuration:<\/h3>\n\n\n\n<p class=\"has-medium-font-size\"><em>embeddings = HuggingFaceEmbeddings(\"sentence-transformers\/all-MiniLM-L6-v2\")<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>db = FAISS.load_local(\"ruta_vector_db\", embeddings)<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>retriever = db.as_retriever()<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>qa_chain = RetrievalQA(llm=llm, retriever=retriever<strong>)<\/strong><\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>HuggingFaceEmbeddings(\"sentence-transformers\/all-MiniLM-L6-v2\"): <\/strong>Uses an embedding model (MiniLM-L6-v2) to convert text into vector representations.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>FAISS.load_local(\"path_vector_db\", embeddings): <\/strong>Loads a FAISS vector database with previously generated embeddings.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>db.as_retriever():<\/strong> It turns the database into a search engine to retrieve relevant information.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>RetrievalQA(llm=llm, retriever=retriever)<\/strong>Combines quantified language modelling with information retrieval to improve response generation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">3. Creating the API with FastAPI:<\/h3>\n\n\n\n<p class=\"has-medium-font-size\"><em>app = FastAPI()<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>@app.post(\"\/generate\")<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>def generate_response(prompt: str, max_length: int = 100):<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; try:<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; response = qa_chain.run(prompt)<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return {\"answer\": answer}<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; except Exception as e:<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; raise HTTPException(status_code=500, detail=str(e))<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><strong>FastAPI(): <\/strong>Create a REST API to expose the RAG model and functionality.<strong>.<\/strong><\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>@app.post(\"\/generate\"):<\/strong> Defines a \/generate endpoint that accepts POST requests with an input prompt.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>qa_chain.run(prompt): <\/strong>It uses a combination of information retrieval (RAG) and text generation to respond.<\/li>\n\n\n\n<li class=\"has-medium-font-size\"><strong>Exception handling:<\/strong> If an error occurs, an HTTP 500 code is returned with the error message.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">4. Server execution:<\/h3>\n\n\n\n<p class=\"has-medium-font-size\"><em>if __name__ == \"__main__\":<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp; import uvicorn<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><em>&nbsp;&nbsp;&nbsp; uvicorn.run(app, host=\"0.0.0.0.0\u2033, port=8000)<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"has-medium-font-size\"><code><strong>uvicorn.run(app, host=\"0.0.0.0\", port=8000)<\/strong><\/code>: Starts the server on port 8000, allowing access to the API.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">Conclusion<\/h3>\n\n\n\n<p class=\"has-medium-font-size\">Companies and developers can now adopt <strong>lighter, faster and more private models thanks to SLMs (Small Language Models)<\/strong>a key breakthrough in the evolution of generative AI. The <strong>quantization of models<\/strong> -which significantly reduces the size and computational requirements without compromising basic performance - allows these models to be run in environments <strong>on-premise<\/strong> or on resource-constrained devices, while maintaining full control over data and processes.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">This approach is complemented by the architecture based on <strong>RAG (Retrieval-Augmented Generation)<\/strong>together with tools such as <strong>FastAPI and LangChain<\/strong>These strategies enable the deployment of AI solutions that are governable, auditable and tailored to specific requirements. These strategies make it possible to <strong>fully controlled AI generation<\/strong>making it a realistic and effective choice for demanding sectors such as data analysis, scientific research or customer service.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">The combination of quantified SLMs, modular architecture and autonomous deployment represents one of the most secure and efficient ways to integrate generative AI into your organisation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-medium-large-font-size\">Want to see how this translates into a real case?<\/h3>\n\n\n\n<p class=\"has-medium-font-size\">Access the full paper and find out more.<\/p>\n\n\n<div class=\"wpforms-container wpforms-container-full wpforms-block wpforms-block-0479e51e-0ca1-4f8c-8b97-091019a9064a\" id=\"wpforms-3927\"><form id=\"wpforms-form-3927\" class=\"wpforms-validate wpforms-form wpforms-ajax-form\" data-formid=\"3927\" method=\"post\" enctype=\"multipart\/form-data\" action=\"\/en\/wp-json\/wp\/v2\/posts\/3910\" data-token=\"21e0f5bc03a6a79512f036010d75f312\" data-token-time=\"1776227441\" data-trp-original-action=\"\/en\/wp-json\/wp\/v2\/posts\/3910\"><noscript class=\"wpforms-error-noscript\">Please enable JavaScript in your browser to complete this form.<\/noscript><div class=\"wpforms-field-container\"><div id=\"wpforms-3927-field_1-container\" class=\"wpforms-field wpforms-field-text ocultar-urlASUNTO\" data-field-id=\"1\"><label class=\"wpforms-field-label\" for=\"wpforms-3927-field_1\">Subject <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"text\" id=\"wpforms-3927-field_1\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][1]\" value=\"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n\" required><\/div><div id=\"wpforms-3927-field_2-container\" class=\"wpforms-field wpforms-field-text\" data-field-id=\"2\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-3927-field_2\">Name <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"text\" id=\"wpforms-3927-field_2\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][2]\" placeholder=\"Name\" required><\/div><div id=\"wpforms-3927-field_3-container\" class=\"wpforms-field wpforms-field-text\" data-field-id=\"3\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-3927-field_3\">Surname <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"text\" id=\"wpforms-3927-field_3\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][3]\" placeholder=\"Surname\" required><\/div><div id=\"wpforms-3927-field_4-container\" class=\"wpforms-field wpforms-field-text\" data-field-id=\"4\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-3927-field_4\"> E-mail <span class=\"wpforms-required-label\">*<\/span><\/label><input type=\"text\" id=\"wpforms-3927-field_4\" class=\"wpforms-field-large wpforms-field-required\" name=\"wpforms[fields][4]\" placeholder=\"E-mail\" required><\/div><div id=\"wpforms-3927-field_5-container\" class=\"wpforms-field wpforms-field-text\" data-field-id=\"5\"><label class=\"wpforms-field-label wpforms-label-hide\" for=\"wpforms-3927-field_5\">Contact telephone number <\/label><input type=\"text\" id=\"wpforms-3927-field_5\" class=\"wpforms-field-large\" name=\"wpforms[fields][5]\" placeholder=\"Contact telephone number\" ><\/div><div id=\"wpforms-3927-field_6-container\" class=\"wpforms-field wpforms-field-checkbox boxtransparent\" data-field-id=\"6\"><label class=\"wpforms-field-label wpforms-label-hide\">Do you accept the privacy policy? <span class=\"wpforms-required-label\">*<\/span><\/label><ul id=\"wpforms-3927-field_6\" class=\"wpforms-field-required\"><li class=\"choice-1 depth-1\"><input type=\"checkbox\" id=\"wpforms-3927-field_6_1\" name=\"wpforms[fields][6][]\" value=\"He le\u00eddo y acepto la pol\u00edtica de privacidad\" required><label class=\"wpforms-field-label-inline\" for=\"wpforms-3927-field_6_1\">I have read and accept the privacy policy <span class=\"wpforms-required-label\">*<\/span><\/label><\/li><\/ul><div class=\"wpforms-field-description wpforms-disclaimer-description\"><b>IMPORTANT:<\/b> Read our <a href='https:\/\/exceltic.com\/politica-de-privacidad\/' target='_blank' style=\"color: #e74b10;\" >Privacy Policy<\/a> before proceeding. The information you provide may contain personal information.<\/div><\/div><\/div><!-- .wpforms-field-container --><div class=\"wpforms-recaptcha-container wpforms-is-recaptcha wpforms-is-recaptcha-type-v2\" ><div class=\"g-recaptcha\" data-sitekey=\"6Lf9t3srAAAAAEstc-76vP4_LAvjdEbLQwgAri5D\"><\/div><input type=\"text\" name=\"g-recaptcha-hidden\" class=\"wpforms-recaptcha-hidden\" style=\"position:absolute!important;clip:rect(0,0,0,0)!important;height:1px!important;width:1px!important;border:0!important;overflow:hidden!important;padding:0!important;margin:0!important;\" data-rule-recaptcha=\"1\"><\/div><div class=\"wpforms-submit-container\" ><input type=\"hidden\" name=\"wpforms[id]\" value=\"3927\"><input type=\"hidden\" name=\"page_title\" value=\"\"><input type=\"hidden\" name=\"page_url\" value=\"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/posts\/3910\"><input type=\"hidden\" name=\"url_referer\" value=\"\"><button type=\"submit\" name=\"wpforms[submit]\" id=\"wpforms-submit-3927\" class=\"wpforms-submit\" data-alt-text=\"Enviando...\" data-submit-text=\"Enviar\" aria-live=\"assertive\" value=\"wpforms-submit\">Send<\/button><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/exceltic.serquo.com\/wp-content\/plugins\/wpforms-lite\/assets\/images\/submit-spin.svg\" class=\"wpforms-submit-spinner\" style=\"display: none;\" width=\"26\" height=\"26\" alt=\"Loading\"><\/div><input type=\"hidden\" name=\"trp-form-language\" value=\"en\"\/><\/form><\/div>  <!-- .wpforms-container -->","protected":false},"excerpt":{"rendered":"<p>Implement generative AI with small, quantified models. Gain efficiency, privacy and full control with SLM and RAG in secure environments.<\/p>","protected":false},"author":1,"featured_media":3916,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-3910","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-eventos"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n -<\/title>\n<meta name=\"description\" content=\"Implementa IA generativa con modelos peque\u00f1os y cuantizados. Gana eficiencia, privacidad y control total con SLM y RAG en entornos seguros.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/exceltic.serquo.com\/en\/generative-ia-with-full-control-small-slms-language-models-and-quantisation-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n -\" \/>\n<meta property=\"og:description\" content=\"Implementa IA generativa con modelos peque\u00f1os y cuantizados. Gana eficiencia, privacidad y control total con SLM y RAG en entornos seguros.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/exceltic.serquo.com\/en\/generative-ia-with-full-control-small-slms-language-models-and-quantisation-2\/\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/exceltic\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-06-23T14:25:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-19T12:28:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Serquo Admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@exceltic\" \/>\n<meta name=\"twitter:site\" content=\"@exceltic\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Serquo Admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/\"},\"author\":{\"name\":\"Serquo Admin\",\"@id\":\"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5\"},\"headline\":\"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n\",\"datePublished\":\"2025-06-23T14:25:41+00:00\",\"dateModified\":\"2026-03-19T12:28:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/\"},\"wordCount\":1557,\"image\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png\",\"articleSection\":[\"Eventos\"],\"inLanguage\":\"en-GB\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/\",\"url\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/\",\"name\":\"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n -\",\"isPartOf\":{\"@id\":\"https:\/\/exceltic.serquo.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png\",\"datePublished\":\"2025-06-23T14:25:41+00:00\",\"dateModified\":\"2026-03-19T12:28:24+00:00\",\"author\":{\"@id\":\"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5\"},\"description\":\"Implementa IA generativa con modelos peque\u00f1os y cuantizados. Gana eficiencia, privacidad y control total con SLM y RAG en entornos seguros.\",\"breadcrumb\":{\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage\",\"url\":\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png\",\"contentUrl\":\"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Portada\",\"item\":\"https:\/\/exceltic.serquo.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/exceltic.serquo.com\/#website\",\"url\":\"https:\/\/exceltic.serquo.com\/\",\"name\":\"\",\"description\":\"Ingenier\u00eda y Consultor\u00eda\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/exceltic.serquo.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5\",\"name\":\"Serquo Admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/exceltic.serquo.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c335bd9da7b1e9a671364c54e17ba39cb4a646b8eb953383973649c257b1fe49?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c335bd9da7b1e9a671364c54e17ba39cb4a646b8eb953383973649c257b1fe49?s=96&d=mm&r=g\",\"caption\":\"Serquo Admin\"},\"url\":\"https:\/\/exceltic.serquo.com\/en\/author\/serquo\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Generative AI with Total Control: Small Language Models (SLMs) and Quantization","description":"Implement generative AI with small, quantified models. Gain efficiency, privacy and full control with SLM and RAG in secure environments.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/exceltic.serquo.com\/en\/generative-ia-with-full-control-small-slms-language-models-and-quantisation-2\/","og_locale":"en_GB","og_type":"article","og_title":"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n -","og_description":"Implementa IA generativa con modelos peque\u00f1os y cuantizados. Gana eficiencia, privacidad y control total con SLM y RAG en entornos seguros.","og_url":"https:\/\/exceltic.serquo.com\/en\/generative-ia-with-full-control-small-slms-language-models-and-quantisation-2\/","article_publisher":"https:\/\/www.facebook.com\/exceltic\/","article_published_time":"2025-06-23T14:25:41+00:00","article_modified_time":"2026-03-19T12:28:24+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png","type":"image\/png"}],"author":"Serquo Admin","twitter_card":"summary_large_image","twitter_creator":"@exceltic","twitter_site":"@exceltic","twitter_misc":{"Written by":"Serquo Admin","Estimated reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#article","isPartOf":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/"},"author":{"name":"Serquo Admin","@id":"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5"},"headline":"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n","datePublished":"2025-06-23T14:25:41+00:00","dateModified":"2026-03-19T12:28:24+00:00","mainEntityOfPage":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/"},"wordCount":1557,"image":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage"},"thumbnailUrl":"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png","articleSection":["Eventos"],"inLanguage":"en-GB"},{"@type":"WebPage","@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/","url":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/","name":"Generative AI with Total Control: Small Language Models (SLMs) and Quantization","isPartOf":{"@id":"https:\/\/exceltic.serquo.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage"},"image":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage"},"thumbnailUrl":"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png","datePublished":"2025-06-23T14:25:41+00:00","dateModified":"2026-03-19T12:28:24+00:00","author":{"@id":"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5"},"description":"Implement generative AI with small, quantified models. Gain efficiency, privacy and full control with SLM and RAG in secure environments.","breadcrumb":{"@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#primaryimage","url":"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png","contentUrl":"https:\/\/exceltic.serquo.com\/wp-content\/uploads\/2025\/06\/portada-blog-ia-generativa.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/exceltic.serquo.com\/ia-generativa-con-control-total-pequenos-modelos-de-lenguaje-slms-y-cuantizacion-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Portada","item":"https:\/\/exceltic.serquo.com\/"},{"@type":"ListItem","position":2,"name":"IA Generativa con Control Total: Peque\u00f1os Modelos de Lenguaje (SLMs) y Cuantizaci\u00f3n"}]},{"@type":"WebSite","@id":"https:\/\/exceltic.serquo.com\/#website","url":"https:\/\/exceltic.serquo.com\/","name":"","description":"Engineering and Consulting","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/exceltic.serquo.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/exceltic.serquo.com\/#\/schema\/person\/aa4f97f58379cc64179590f276472ad5","name":"Serquo Admin","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/exceltic.serquo.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c335bd9da7b1e9a671364c54e17ba39cb4a646b8eb953383973649c257b1fe49?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c335bd9da7b1e9a671364c54e17ba39cb4a646b8eb953383973649c257b1fe49?s=96&d=mm&r=g","caption":"Serquo Admin"},"url":"https:\/\/exceltic.serquo.com\/en\/author\/serquo\/"}]}},"_links":{"self":[{"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/posts\/3910","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/comments?post=3910"}],"version-history":[{"count":10,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/posts\/3910\/revisions"}],"predecessor-version":[{"id":3986,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/posts\/3910\/revisions\/3986"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/media\/3916"}],"wp:attachment":[{"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/media?parent=3910"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/categories?post=3910"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/exceltic.serquo.com\/en\/wp-json\/wp\/v2\/tags?post=3910"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}