The Scandal at Kaggle
As I write these words, I marvel at my silver medal from the 2024 Automated Essay Scoring competition on Kaggle. This competition will go down… Read More »The Scandal at Kaggle
As I write these words, I marvel at my silver medal from the 2024 Automated Essay Scoring competition on Kaggle. This competition will go down… Read More »The Scandal at Kaggle
TL;DR: We built a bot that suggests a meaningful response to an ongoing conversation thread on GitHub. This bot can serve as a coding and… Read More »Compound AI Systems: Building a GitHub bot with Llama 3 and dltHub
Apple announced their new high-end Mac Pro desktop with 24 CPU cores, up to 76 GPU cores, 192 GB memory and 800GB/s of system memory… Read More »A DuckDB moment for application servers?
Last week I had the chance to visit a major global fashion retailer and give an industry talk on Real-time AI. This company was hosting… Read More »Time Value of Data: The Summit of Now and the Peak of Soon After
I am playing with Graphext – it’s like Trifacta, but with more powerful data science functionality. If you are a product manager or finance person,… Read More »Graphext, data insights for non-data scientists
Update: Added the Stanford NLP link for constituent parse trees in text form I needed to visualize a sentence parse tree of the “constituent” variety… Read More »Showing constituent parse trees in the browser
About a month ago I wrote a 3-part blog series (parts 1, 2, and 3) on predicting user engagement with news in Reddit communities (subreddits).… Read More »Predicting user engagement with news on Reddit using Kaggle or Colab
Ever wanted to define alerts and monitor the status of your Cloud Dataflow jobs programmatically instead of checking some UI every 30 minutes? You can… Read More »How to programmatically monitor your Cloud Dataflow jobs
Ever wanted to track your resource usage and costs by specific Cloud Dataflow jobs? Cloud Dataflow recently started labeling billing records with Job Ids. Here… Read More »Calculating per-job Cloud Dataflow costs - now possible with job labels
I am working on the Reddit Community Engagement analysis, and one of my data sources is the GDELT BigQuery dataset. I love the richness of… Read More »Building dictionaries for Word Encodings using BigQuery SQL