First Steps#
This guide walks you through the core TanaT workflow: loading data, choosing the right sequence type, and exploring your temporal data.
Note
Make sure TanaT is installed: pip install tanat (see Installation).
1. Prepare Your Data#
TanaT works with pandas DataFrames containing temporal data:
import pandas as pd
# Sample data: patient visits
data = pd.DataFrame({
'patient_id': ['P001', 'P001', 'P001', 'P002', 'P002'],
'visit_date': pd.to_datetime([
'2023-01-15', '2023-02-20', '2023-03-10',
'2023-01-20', '2023-03-15'
]),
'visit_type': ['GP', 'SPECIALIST', 'GP', 'GP', 'EMERGENCY']
})
2. Choose the Right Sequence Type#
Before creating a pool, identify which sequence type matches your data:
Type |
Your data has… |
Example |
|---|---|---|
EventSequence |
Single timestamps (punctual events) |
Medical visits, purchases, clicks |
IntervalSequence |
Start + end dates (can overlap) |
Treatments, hospital stays, projects |
StateSequence |
Contiguous states (no gaps, no overlap) |
Disease stages, employment status |
For our example, visits are punctual events so we use EventSequencePool.
3. Create a Sequence Pool#
A pool groups sequences from multiple individuals:
from tanat.sequence import EventSequencePool
pool = EventSequencePool(data, settings={
"id_column": "patient_id",
"time_column": "visit_date",
"entity_features": ["visit_type"]
})
4. Verify Inferred Metadata#
When you display the pool, TanaT shows a summary including automatically inferred metadata. It’s important to verify this inference is correct before proceeding:
# Display pool summary with inferred metadata
print(pool)
┌──────────────────────────────────────────────────┐
│ EventSequencePool summary │
└──────────────────────────────────────────────────┘
STATISTICS
─────────────────────────
Total sequences 2
Average length 2.5
...
Metadata:
Temporal:
Type: datetime
Granularity: DAY
Entity Features (1):
- visit_type: categorical
You can also get a compact metadata view:
print(pool.metadata.describe())
If the inference is incorrect, you can update the metadata:
# Example: correct the timezone
pool.update_temporal_metadata(timezone="Europe/Paris")
# Example: specify ordered categories
pool.update_entity_metadata(
feature_name="visit_type",
categories=["GP", "SPECIALIST", "EMERGENCY"],
ordered=True
)
See also
Metadata for complete metadata documentation.
5. Access Individual Sequences#
# Get a specific patient's sequence
patient = pool['P001']
print(f"Patient P001: {len(patient)} visits")
# View the underlying data
print(patient.sequence_data)
6. Access Individual Entities#
Within a sequence, you can access individual entities (observations):
# Get the first entity (visit) in the sequence
first_visit = patient[0]
# Access entity properties
print(f"Temporal extent: {first_visit.extent}") # 2023-01-15 00:00:00
print(f"Value: {first_visit.value}") # GP
# Iterate over all entities
for entity in patient:
print(f"{entity.extent}: {entity.value}")
Next Steps#
- Core Concepts
Deep dive into TanaT core concepts: sequences, trajectories, pools, settings
- Examples Gallery
See complete examples with metrics, visualization, and clustering
- Metadata
Complete metadata reference and update methods
- Reference
Full API documentation