cudf-analytics

Name: cudf-analytics
Author: langchain-ai/deepagents

$npx mdskill add langchain-ai/deepagents/cudf-analytics

Accelerate large-scale data analysis by performing computations directly on NVIDIA GPUs.

Handles statistical summaries, grouping, and outlier detection on tabular data.
Integrates with NVIDIA cuDF for GPU-accelerated data manipulation.
Triggers when tasks require profiling or complex aggregations on datasets.
Returns results via a pandas-compatible structure, optimized for downstream use.

SKILL.md

.github/skills/cudf-analyticsView on GitHub ↗

---
name: cudf-analytics
description: Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.
---

# cuDF Analytics Skill

GPU-accelerated data analysis using NVIDIA RAPIDS cuDF. cuDF provides a pandas-like API that runs on NVIDIA GPUs, enabling massive speedups on large datasets.

## When to Use This Skill

Use this skill when:
- Analyzing CSV files, datasets, or tabular data
- Computing statistical summaries (mean, median, std, quartiles)
- Performing groupby aggregations
- Detecting anomalies or outliers in data
- Profiling datasets with millions of rows
- Computing correlation matrices

## Initialization (REQUIRED)

Always start every script with this boilerplate. It tests actual GPU operations, not just import.

```python
import pandas as pd

try:
    import cudf
    # Smoke-test: verify GPU compute AND host transfer both work
    _test = cudf.Series([1, 2, 3])
    assert _test.sum() == 6
    assert _test.to_pandas().tolist() == [1, 2, 3]
    GPU = True
except Exception as e:
    print(f"[GPU] cudf unavailable, falling back to pandas: {e}")
    GPU = False

def read_csv(path):
    return cudf.read_csv(path) if GPU else pd.read_csv(path)

def to_pd(df):
    """Convert cuDF DataFrame/Series to pandas. Use this instead of .to_pandas() directly."""
    if not GPU:
        return df
    try:
        return df.to_pandas()
    except Exception as e:
        print(f"[GPU] .to_pandas() failed, using Arrow fallback: {e}")
        return df.to_arrow().to_pandas()
```

## Quick Reference

cuDF mirrors the pandas API. Common operations:

### Read Data
```python
df = read_csv("data.csv")
```

### Statistical Summary
```python
# Use to_pd() when you need pandas output
summary = to_pd(df[["value", "score"]].describe())

# Scalar values work directly with float()
mean_val = float(df["value"].mean())
q1 = float(df["value"].quantile(0.25))

# Correlation
corr = float(df["value"].corr(df["score"]))
```

### Groupby Aggregation
```python
result = df.groupby("category").agg({
    "revenue": ["sum", "mean", "count"],
    "quantity": ["sum", "mean"],
})
result_pd = to_pd(result)
```

### Anomaly Detection (IQR Method)
```python
col = "value"
Q1 = float(df[col].quantile(0.25))
Q3 = float(df[col].quantile(0.75))
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
outliers = to_pd(df[(df[col] < lower) | (df[col] > upper)])
```

### Anomaly Detection (Z-Score Method)
```python
mean = float(df[col].mean())
std = float(df[col].std())
df["z_score"] = (df[col] - mean) / std
anomalies = to_pd(df[df["z_score"].abs() > 3])
```

### Filtering and Selection
```python
# Filter rows
filtered = df[df["status"] == "active"]

# Select columns
subset = df[["name", "revenue", "date"]]

# Sort
sorted_df = df.sort_values("revenue", ascending=False)

# Convert to pandas for final output / iteration
result_pd = to_pd(sorted_df)
```

## Data Type Requirements

cuDF requires explicit type specification for optimal performance:
- Use `float32` or `float64` for numeric data
- Use `int32` or `int64` for integer data
- String columns use cuDF's string dtype automatically

## Output Guidelines

When reporting analysis results:
- Include dataset dimensions (rows x columns)
- Show key statistics in formatted tables
- Highlight notable patterns, trends, or anomalies
- Provide both summary statistics and specific examples
- Note any data quality issues (missing values, outliers)

More from langchain-ai/deepagents

Skill	Description
analyze-market	>-
arxiv-search	Searches arXiv for preprints and academic papers, retrieves abstracts, and filters by topic. Use when the user asks to find research papers, search arXiv, look up preprints, find academic articles in physics, math, CS, biology, statistics, or related fields.
blog-post	Writes and structures long-form blog posts, creates tutorial outlines, and optimizes content for SEO with cover image generation. Use when the user asks to write a blog post, article, how-to guide, tutorial, technical writeup, thought leadership piece, or long-form content.
code-review	Perform a structured code review of changes, checking for correctness, style, tests, and potential issues.
coding-prefs	Read the user's coding preferences from /memory/coding-prefs.md before making non-trivial style decisions, and append new preferences when the user gives durable feedback.
competitor-analysis	>-
cuml-machine-learning	Use for GPU-accelerated machine learning on tabular data using NVIDIA cuML. Triggers when tasks involve classification, regression, clustering, dimensionality reduction, or model training on datasets.
data-visualization	Use for creating publication-quality charts and multi-panel analysis summaries. Triggers when tasks involve visualizing data, plotting results, creating charts, or producing visual reports from analysis output.
gpu-document-processing	Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.
langgraph-docs	Fetches and references LangGraph Python documentation to build stateful agents, create multi-agent workflows, and implement human-in-the-loop patterns. Use when the user asks about LangGraph, graph agents, state machines, agent orchestration, LangGraph API, or needs LangGraph implementation guidance.