Day 20 – Conversations, Social Contact, Mental Health and How AI Will Likely Ruin Us Further

Today was mostly conversations.

Are conversations useful?

If they lead to valuable action action then yes for sure.

But if you’ve watched Stutz on Netflix, you’ll know that real world conversations – and hence socialising – are also vital for your own wellbeing and mental health. It’s more helpful if the conversations are interesting and not poisonous, but any social contact is better than none, funnily enough.

If you are working on your own a lot as a founder, it’s so important to have social contact to ground yourself in reality. It’s very good for your brain.

The sad reality of future AI is that people will be more and more communicating with AI generated feedback. But it’s down to each one of us to make sure you keep on talking, especially technical founders who are introverted.

I was listening to someone the other day who said that whoever is alive now… we are the final era of humans who knew what life was like before AI (and robots) started to take over and reduce the cost of knowledge and content creation down to almost zero.

Interesting new world!

In other news, did some more R&D on web crawling. It turns out you can use the Crawl4AI python package in tandem with a language model and it will automatically run it through your prompt. I’ll do a video on it another time but for the moment here is my code. It basically will rewrite the BBC article as an excited Arsenal fan.

    
    # Example 2: Using Pruning filter
    url2 = "https://www.bbc.co.uk/sport/football/live/c8j00ke2r23t"
    success2, content2, file2 = await crawl_url(
        url=url2,
        filter_type="llm",
        llm_instruction="""
        Rewrite this as if you are an excited Arsenal fan.
        Include:
        - Emotive descriptive language of the goals
        Exclude:
        - Navigation elements
        - Sidebars
        - Footer content
        Format the output as clean markdown with proper paragraphs and headers.
        """,
    )

and this is the definition of my custom function

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, BM25ContentFilter, CacheMode, DefaultMarkdownGenerator, LLMContentFilter, PruningContentFilter
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig, LlmConfig
import os
from dotenv import load_dotenv

async def crawl_url(url, filter_type="prune", query=None, llm_instruction=None):
    """
    Crawl a URL and apply a specified content filter.
    
    Args:
        url (str): The URL to crawl
        filter_type (str): Type of filter to use - "bm25", "prune", or "llm"
        query (str): Query for BM25 or Pruning filters
        llm_instruction (str): Instruction for LLM filter
        
    Returns:
        tuple: (success, markdown_content, output_filename)
    """
    # Load environment variables from .env file
    load_dotenv()
    OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
    
    # Select the appropriate content filter based on filter_type
    if filter_type == "bm25" and query:
        content_filter = BM25ContentFilter(
            user_query=query,
            bm25_threshold=1.2,
            # use_stemming=True
        )
    elif filter_type == "prune" and query:
        content_filter = PruningContentFilter(
            user_query=query,
            threshold=0.5,
            threshold_type="fixed",  # or "dynamic"
            min_word_threshold=50
        )
    elif filter_type == "llm" and llm_instruction:
        content_filter = LLMContentFilter(
            llmConfig=LlmConfig(provider="openai/gpt-4o-mini", api_token=OPENAI_API_KEY),
            instruction=llm_instruction,
            chunk_token_threshold=4096,
            verbose=True
        )
    else:
        # Default to pruning filter with empty query if no valid filter specified
        content_filter = PruningContentFilter(
            user_query=query or "",
            threshold=0.5,
            threshold_type="fixed",
            min_word_threshold=50
        )
    
    md_generator = DefaultMarkdownGenerator(
        content_filter=content_filter,
        options={"ignore_links": True},
    )

    config = CrawlerRunConfig(markdown_generator=md_generator)

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url=url, config=config)

        if not result.success:
            print(f"Crawl failed: {result.error_message}")
            print(f"Status code: {result.status_code}")
            return False, None, None

        # Create a filename based on the URL
        # Remove protocol and replace special characters
        filename = url.replace("https://", "").replace("http://", "").replace("/", "_").rstrip("_")
        output_file = f"{filename}.md"
        
        # Write the extracted content to a markdown file
        with open(output_file, "w", encoding="utf-8") as f:
            f.write(result.markdown.fit_markdown)
        
        print(f"Content successfully exported to {output_file}")
        return True, result.markdown.fit_markdown, output_file

Day 20 – Conversations, Social Contact, Mental Health and How AI Will Likely Ruin Us Further

Comments

Leave a Reply Cancel reply

More posts

Day 387 – Cursor Automations

Day 342 to Day 386 – New Beginnings

Day 341 – forge laravel installing pgvector with digital ocean

Day 330 – 340 Understanding Vectors, Embeddings, and LLMs: A Practical Guide