Langchain Is Pointless

It’s filled with crap like this:

    for i in range(n_results, 0, -1):
        try:
            return self._collection.query(
                query_texts=query_texts,
                query_embeddings=query_embeddings,
                n_results=i,
                where=where,
                **kwargs,
            )

and this:

def embed_documents(self, texts: List[str]) -> List[List[float]]:
    texts=list(map(lambda x: x.replace("n", " "), texts))
    embeddings=self.client.encode(texts, **self.encode_kwargs)
    return embeddings.tolist()

and this:

class CharacterTextSplitter(TextSplitter):
    """Implementation of splitting text that looks at characters."""

def __init__(self, separator: str="nn", **kwargs: Any):
    """Create a new TextSplitter."""
    super().__init__(**kwargs)
    self._separator=separator

def split_text(self, text: str) -> List[str]:
    """Split incoming text and return chunks."""
    # First we naively split the large input into a bunch of smaller ones.
    if self._separator:
        splits=text.split(self._separator)
    else:
        splits=list(text)

In short: https://i.imgur.com/OffEJTR.gifv

Embeddings is just a do-nothing wrapper for SentenceTransformers. Chroma is just a do-nothing wrapper for ChromaDB. It’s filled with “helper” functions that just call normal Python functions. A dedicated TextSplitter that calls split() from builtins.py? What? Why? Templates are no more useful than calling .replace() on a string. “texts” are just strings and “documents” are just a pointless dict that contain “texts.” Just load the strings from your datasource yourself. The README is both grandiose and vague. The documentation is out-of-date and inconsistent. The import footprint is weirdly massive–highly modularized but nothing seems to do anything that’d take more than a few CPU cycles. There’s not really a standard interoperable datatype, so you’re actually led further afield than if you had just clearly defined the simple lists and strings required for hitting an LLM.

The very concept of chaining operations when interacting with LLMs doesn’t really make sense to me: it’s basically one requests call to a generation backend, but it’s not like it even handles websockets and streaming for you. Why chain together wrapper classes when you can just do the operations yourself?

This seems like a beginner’s project that blew up because it’s riding a tidal wave of interest in the broader topic.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *