Loading Data

Author

Affiliation

Griffen Wakelin

Dalhousie University

Published

February 6, 2024

Modified

February 6, 2024

The `anndata` object

Single-cell datasets in scanpy are formatted according to the annotated data (anndata) python package. Interested readers can view more complete introductions to the anndata object given in the scverse’s “Getting started with the anndata package” and anndata’s “Getting started with anndata” tutorials.

The active matrix (`adata.X`)

A simple example of how an anndata object is formatted is exemplified below. Here, we first create a \(100\times100\) numpy array, x, filled according to a Poisson distribution with \(\lambda\) = \(0\):

import anndata as ad
import numpy as np
import scanpy as sc

x = np.random.poisson(lam = 1.0, size = (100, 100))
print(x)

[[1 1 3 ... 1 1 1]
 [0 2 3 ... 0 1 2]
 [2 2 0 ... 1 0 1]
 ...
 [0 1 2 ... 0 1 0]
 [0 3 1 ... 0 0 0]
 [2 1 1 ... 0 0 3]]

We can create an anndata object from this array alone:

adata = ad.AnnData(X = x)
print(adata)

AnnData object with n_obs × n_vars = 100 × 100

Our original array is accessible using the .X method of the anndata object:

adata.X

array([[1, 1, 3, ..., 1, 1, 1],
       [0, 2, 3, ..., 0, 1, 2],
       [2, 2, 0, ..., 1, 0, 1],
       ...,
       [0, 1, 2, ..., 0, 1, 0],
       [0, 3, 1, ..., 0, 0, 0],
       [2, 1, 1, ..., 0, 0, 3]])

Cell- (`adata.obs`) and gene-level (`adata.var`) metadata

anndata objects are arranged in such a way that they can store (and save) cell- and gene-level metadata very easily. Cell-level metadata are stored in a pandas DataFrame accessible using the .obs method (obs for observations a.k.a. cells) and the gene-level metadata in a pandas DataFrame accessible via the .var method (for variables a.k.a. genes).

print(adata.obs, adata.var)

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

[100 rows x 0 columns] Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

[100 rows x 0 columns]

In the case where you are importing single-cell datasets, these data frames would be indexed by the cell barcode and gene names, respectively, looking something like this:

adata.obs_names = [f'Cell_{i}' for i in range(adata.n_obs)]
adata.var_names = [f'Gene_{i}' for i in range(adata.n_vars)]
print(adata.obs, adata.var)

Empty DataFrame
Columns: []
Index: [Cell_0, Cell_1, Cell_2, Cell_3, Cell_4, Cell_5, Cell_6, Cell_7, Cell_8, Cell_9, Cell_10, Cell_11, Cell_12, Cell_13, Cell_14, Cell_15, Cell_16, Cell_17, Cell_18, Cell_19, Cell_20, Cell_21, Cell_22, Cell_23, Cell_24, Cell_25, Cell_26, Cell_27, Cell_28, Cell_29, Cell_30, Cell_31, Cell_32, Cell_33, Cell_34, Cell_35, Cell_36, Cell_37, Cell_38, Cell_39, Cell_40, Cell_41, Cell_42, Cell_43, Cell_44, Cell_45, Cell_46, Cell_47, Cell_48, Cell_49, Cell_50, Cell_51, Cell_52, Cell_53, Cell_54, Cell_55, Cell_56, Cell_57, Cell_58, Cell_59, Cell_60, Cell_61, Cell_62, Cell_63, Cell_64, Cell_65, Cell_66, Cell_67, Cell_68, Cell_69, Cell_70, Cell_71, Cell_72, Cell_73, Cell_74, Cell_75, Cell_76, Cell_77, Cell_78, Cell_79, Cell_80, Cell_81, Cell_82, Cell_83, Cell_84, Cell_85, Cell_86, Cell_87, Cell_88, Cell_89, Cell_90, Cell_91, Cell_92, Cell_93, Cell_94, Cell_95, Cell_96, Cell_97, Cell_98, Cell_99]

[100 rows x 0 columns] Empty DataFrame
Columns: []
Index: [Gene_0, Gene_1, Gene_2, Gene_3, Gene_4, Gene_5, Gene_6, Gene_7, Gene_8, Gene_9, Gene_10, Gene_11, Gene_12, Gene_13, Gene_14, Gene_15, Gene_16, Gene_17, Gene_18, Gene_19, Gene_20, Gene_21, Gene_22, Gene_23, Gene_24, Gene_25, Gene_26, Gene_27, Gene_28, Gene_29, Gene_30, Gene_31, Gene_32, Gene_33, Gene_34, Gene_35, Gene_36, Gene_37, Gene_38, Gene_39, Gene_40, Gene_41, Gene_42, Gene_43, Gene_44, Gene_45, Gene_46, Gene_47, Gene_48, Gene_49, Gene_50, Gene_51, Gene_52, Gene_53, Gene_54, Gene_55, Gene_56, Gene_57, Gene_58, Gene_59, Gene_60, Gene_61, Gene_62, Gene_63, Gene_64, Gene_65, Gene_66, Gene_67, Gene_68, Gene_69, Gene_70, Gene_71, Gene_72, Gene_73, Gene_74, Gene_75, Gene_76, Gene_77, Gene_78, Gene_79, Gene_80, Gene_81, Gene_82, Gene_83, Gene_84, Gene_85, Gene_86, Gene_87, Gene_88, Gene_89, Gene_90, Gene_91, Gene_92, Gene_93, Gene_94, Gene_95, Gene_96, Gene_97, Gene_98, Gene_99]

[100 rows x 0 columns]

Additional metadata slots (`adata.uns`, `adata.obsm`, `adata.varm`, `adata.obsp`)

As you perform a normal single-cell workflow, different slots of the anndata object will gradually be populated:

# Quality Control
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)
print(adata)

AnnData object with n_obs × n_vars = 100 × 100
    obs: 'n_genes_by_counts', 'total_counts'
    var: 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'

# PCA
sc.pp.pca(adata)
print(adata)

AnnData object with n_obs × n_vars = 100 × 100
    obs: 'n_genes_by_counts', 'total_counts'
    var: 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
    uns: 'pca'
    obsm: 'X_pca'
    varm: 'PCs'

# Neighbors graph construction + Clustering
sc.pp.neighbors(adata)
sc.tl.leiden(adata)
print(adata)

AnnData object with n_obs × n_vars = 100 × 100
    obs: 'n_genes_by_counts', 'total_counts', 'leiden'
    var: 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
    uns: 'pca', 'neighbors', 'leiden'
    obsm: 'X_pca'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

We can view various slots of the anndata object to see which metadata it is holding:

adata.obs

	n_genes_by_counts	total_counts	leiden
Cell_0	60	88	2
Cell_1	64	99	0
Cell_2	65	102	5
Cell_3	52	79	4
Cell_4	72	111	1
...	...	...	...
Cell_95	59	113	5
Cell_96	64	110	6
Cell_97	60	106	2
Cell_98	60	85	3
Cell_99	63	101	0

100 rows × 3 columns

adata.var

	n_cells_by_counts	mean_counts	pct_dropout_by_counts	total_counts
Gene_0	60	0.88	40.0	88
Gene_1	60	1.04	40.0	104
Gene_2	59	0.92	41.0	92
Gene_3	63	1.01	37.0	101
Gene_4	60	1.00	40.0	100
...	...	...	...	...
Gene_95	70	1.20	30.0	120
Gene_96	62	0.91	38.0	91
Gene_97	63	0.98	37.0	98
Gene_98	59	0.85	41.0	85
Gene_99	66	0.94	34.0	94

100 rows × 4 columns

Loading in data

The goal of loading in data is to have the anndata object formatted as previously described, with the counts matrix in the .X slot, the cell barcode names in the .obs_names slot, and the gene names in the .var_names slot.

10x Genomics Data

In the case where you have a 10x Genomics dataset in the Matrix Market format, scanpy provides a convenience function which will organize the data automatically:

!ls outs/filtered_feature_bc_matrix/

barcodes.tsv.gz features.tsv.gz matrix.mtx.gz

import scanpy as sc

adata = sc.read_10x_mtx("outs/filtered_feature_bc_matrix/")
print(adata)

AnnData object with n_obs × n_vars = 7489 × 32285
    var: 'gene_ids', 'feature_types'

This is also true for 10x datasets in the H5 format:

adata = sc.read_10x_h5("outs/filtered_feature_bc_matrix.h5")
print(adata)

AnnData object with n_obs × n_vars = 7489 × 32285
    var: 'gene_ids', 'feature_types', 'genome'

See 10x Genomics’ support page on cellranger outputs for more details on how their output files are organized.

Tabular data

Datasets from platforms besides 10x do not have an agreed-upon format but tend to be tabular in nature (in the form of .csv or .tsv files). In these cases, you need to do the data wrangling yourself.

As an example (from GSE129114), here is a dataset generated using the smart-seq2 protocol. I will first load in the data using a naive, default approach with pandas.read_csv():

import pandas as pd
df = pd.read_csv("GSE129114_E9.5_anterior_Sox10_counts.txt")
df

	SS2_17_554_A21 SS2_17_554_A20 SS2_17_554_A9 SS2_17_554_A19 SS2_17_554_A24 SS2_17_554_A22 SS2_17_554_A11 SS2_17_554_A23 SS2_17_554_A14 SS2_17_554_A17 SS2_17_554_A16 SS2_17_554_A15 SS2_17_554_A13 SS2_17_554_A18 SS2_17_554_A10 SS2_17_554_A12 SS2_17_554_B14 SS2_17_554_B9 SS2_17_554_B23 SS2_17_554_B13 SS2_17_554_B16 SS2_17_554_B12 SS2_17_554_B10 SS2_17_554_B11 SS2_17_554_B22 SS2_17_554_B21 SS2_17_554_B18 SS2_17_554_C11 SS2_17_554_B17 SS2_17_554_C10 SS2_17_554_B19 SS2_17_554_B15 SS2_17_554_C9 SS2_17_554_B20 SS2_17_554_B24 SS2_17_554_C12 SS2_17_554_C17 SS2_17_554_C13 SS2_17_554_C14 SS2_17_554_C21 SS2_17_554_C15 SS2_17_554_C20 SS2_17_554_C18 SS2_17_554_C16 SS2_17_554_C19 SS2_17_554_C22 SS2_17_554_C23 SS2_17_554_C24 SS2_17_554_D9 SS2_17_554_D16 SS2_17_554_D10 SS2_17_554_D13 SS2_17_554_D15 SS2_17_554_D11 SS2_17_554_D12 SS2_17_554_D14 SS2_17_554_D18 SS2_17_554_D17 SS2_17_554_D19 SS2_17_554_D20 SS2_17_554_D21 SS2_17_554_E10 SS2_17_554_D22 SS2_17_554_D23 SS2_17_554_E15 SS2_17_554_D24 SS2_17_554_E9 SS2_17_554_E14 SS2_17_554_E13 SS2_17_554_E11 SS2_17_554_E12 SS2_17_554_E16 SS2_17_554_E19 SS2_17_554_E18 SS2_17_554_E17 SS2_17_554_E21 SS2_17_554_E20 SS2_17_554_E23 SS2_17_554_E22 SS2_17_554_E24 SS2_17_554_F13 SS2_17_554_F11 SS2_17_554_F10 SS2_17_554_F9 SS2_17_554_F14 SS2_17_554_F12 SS2_17_554_F15 SS2_17_554_F16 SS2_17_554_F20 SS2_17_554_F17 SS2_17_554_F19 SS2_17_554_F18 SS2_17_554_F22 SS2_17_554_F21 SS2_17_554_F24 SS2_17_554_F23 SS2_17_554_G11 SS2_17_554_G9 SS2_17_554_G12 SS2_17_554_G15 SS2_17_554_G13 SS2_17_554_G14 SS2_17_554_G20 SS2_17_554_G21 SS2_17_554_G17 SS2_17_554_G19 SS2_17_554_G10 SS2_17_554_G16 SS2_17_554_G18 SS2_17_554_G22 SS2_17_554_G24 SS2_17_554_G23 SS2_17_554_H13 SS2_17_554_H16 SS2_17_554_H10 SS2_17_554_H9 SS2_17_554_H18 SS2_17_554_H19 SS2_17_554_H15 SS2_17_554_H11 SS2_17_554_H24 SS2_17_554_H12 SS2_17_554_H23 SS2_17_554_H21 SS2_17_554_H22 SS2_17_554_H17 SS2_17_554_H14 SS2_17_554_H20 SS2_17_554_I16 SS2_17_554_I15 SS2_17_554_I11 SS2_17_554_I9 SS2_17_554_I13 SS2_17_554_I17 SS2_17_554_I14 SS2_17_554_I19 SS2_17_554_I12 SS2_17_554_I10 SS2_17_554_I22 SS2_17_554_I24 SS2_17_554_I20 SS2_17_554_I18 SS2_17_554_I21 SS2_17_554_I23 SS2_17_554_J9 SS2_17_554_J12 SS2_17_554_J15 SS2_17_554_J19 SS2_17_554_J11 SS2_17_554_J13 SS2_17_554_J10 SS2_17_554_J14 SS2_17_554_J16 SS2_17_554_J18 SS2_17_554_J20 SS2_17_554_J23 SS2_17_554_J17 SS2_17_554_J21 SS2_17_554_J24 SS2_17_554_J22 SS2_17_554_K11 SS2_17_554_K12 SS2_17_554_K9 SS2_17_554_K10 SS2_17_554_K13 SS2_17_554_K14 SS2_17_554_K19 SS2_17_554_K21 SS2_17_554_K15 SS2_17_554_K22 SS2_17_554_K17 SS2_17_554_K16 SS2_17_554_K24 SS2_17_554_K18 SS2_17_554_L9 SS2_17_554_K20 SS2_17_554_L10 SS2_17_554_K23 SS2_17_554_L11 SS2_17_554_L12 SS2_17_554_L13 SS2_17_554_L14 SS2_17_554_L15 SS2_17_554_L16 SS2_17_554_L17 SS2_17_554_L18 SS2_17_554_L23 SS2_17_554_L20 SS2_17_554_L19 SS2_17_554_L22 SS2_17_554_L21 SS2_17_554_L24 SS2_17_554_M11 SS2_17_554_M10 SS2_17_554_M9 SS2_17_554_M12 SS2_17_554_M16 SS2_17_554_M13 SS2_17_554_M14 SS2_17_554_M17 SS2_17_554_M15 SS2_17_554_M18 SS2_17_554_M21 SS2_17_554_N12 SS2_17_554_M20 SS2_17_554_M19 SS2_17_554_M22 SS2_17_554_M23 SS2_17_554_M24 SS2_17_554_N11 SS2_17_554_N16 SS2_17_554_N9 SS2_17_554_N19 SS2_17_554_N10 SS2_17_554_N14 SS2_17_554_N13 SS2_17_554_N17 SS2_17_554_N15 SS2_17_554_N18 SS2_17_554_N21 SS2_17_554_N20 SS2_17_554_N22 SS2_17_554_N23 SS2_17_554_N24 SS2_17_554_O11 SS2_17_554_O10 SS2_17_554_O13 SS2_17_554_O9 SS2_17_554_O12 SS2_17_554_O14 SS2_17_554_O15 SS2_17_554_O18 SS2_17_554_O16 SS2_17_554_O17 SS2_17_554_O19 SS2_17_554_O20 SS2_17_554_O22 SS2_17_554_O23 SS2_17_554_O24 SS2_17_554_O21 SS2_17_554_P10 SS2_17_554_P9 SS2_17_554_P11 SS2_17_554_P13 SS2_17_554_P14 SS2_17_554_P16 SS2_17_554_P15 SS2_17_554_P12 SS2_17_554_P23 SS2_17_554_P18 SS2_17_554_P17 SS2_17_554_P24 SS2_17_554_P19 SS2_17_554_P21 SS2_17_554_P22 SS2_17_554_P20 SS2_18_102_A8 SS2_18_102_B2 SS2_18_102_A9 SS2_18_102_A3 SS2_18_102_A4 SS2_18_102_B6 SS2_18_102_B1 SS2_18_102_A5 SS2_18_102_A6 SS2_18_102_A7 SS2_18_102_A2 SS2_18_102_A1 SS2_18_102_A10 SS2_18_102_B5 SS2_18_102_B3 SS2_18_102_B4 SS2_18_102_B7 SS2_18_102_B10 SS2_18_102_B8 SS2_18_102_B9 SS2_18_102_C2 SS2_18_102_C5 SS2_18_102_C1 SS2_18_102_C4 SS2_18_102_C8 SS2_18_102_C10 SS2_18_102_C9 SS2_18_102_C7 SS2_18_102_C3 SS2_18_102_C6 SS2_18_102_D2 SS2_18_102_D9 SS2_18_102_D5 SS2_18_102_D3 SS2_18_102_D1 SS2_18_102_D8 SS2_18_102_D4 SS2_18_102_D6 SS2_18_102_D7 SS2_18_102_D10 SS2_18_102_E2 SS2_18_102_E1 SS2_18_102_E5 SS2_18_102_E4 SS2_18_102_E3 SS2_18_102_E6 SS2_18_102_F1 SS2_18_102_F3 SS2_18_102_E9 SS2_18_102_F4 SS2_18_102_E7 SS2_18_102_F2 SS2_18_102_F5 SS2_18_102_E10 SS2_18_102_E8 SS2_18_102_F6 SS2_18_102_F7 SS2_18_102_F9 SS2_18_102_F8 SS2_18_102_F10 SS2_18_102_G1 SS2_18_102_G5 SS2_18_102_G2 SS2_18_102_G3 SS2_18_102_G4 SS2_18_102_G8 SS2_18_102_G6 SS2_18_102_G7 SS2_18_102_G10 SS2_18_102_G9 SS2_18_102_H2 SS2_18_102_H1 SS2_18_102_H4 SS2_18_102_H3 SS2_18_102_H6 SS2_18_102_H5 SS2_18_102_H10 SS2_18_102_H8 SS2_18_102_H7 SS2_18_102_H9 SS2_18_102_I1 SS2_18_102_I3 SS2_18_102_I2 SS2_18_102_I5 SS2_18_102_I4 SS2_18_102_I6 SS2_18_102_I10 SS2_18_102_I7 SS2_18_102_I9 SS2_18_102_I8 SS2_18_102_J2 SS2_18_102_J5 SS2_18_102_J1 SS2_18_102_J3 SS2_18_102_J4 SS2_18_102_J6 SS2_18_102_J8 SS2_18_102_J7 SS2_18_102_J9 SS2_18_102_J10 SS2_18_102_K2 SS2_18_102_K8 SS2_18_102_K3 SS2_18_102_K4 SS2_18_102_K1 SS2_18_102_K5 SS2_18_102_K9 SS2_18_102_K7 SS2_18_102_K6 SS2_18_102_K10 SS2_18_102_L2 SS2_18_102_L4 SS2_18_102_L3 SS2_18_102_L1 SS2_18_102_L5 SS2_18_102_L6 SS2_18_102_L8 SS2_18_102_L9 SS2_18_102_M4 SS2_18_102_M2 SS2_18_102_M3 SS2_18_102_L10 SS2_18_102_L7 SS2_18_102_M1 SS2_18_102_M5 SS2_18_102_M6 SS2_18_102_M8 SS2_18_102_M7 SS2_18_102_M9 SS2_18_102_M10 SS2_18_102_N1 SS2_18_102_N9 SS2_18_102_N2 SS2_18_102_N3 SS2_18_102_N6 SS2_18_102_N4 SS2_18_102_N8 SS2_18_102_N5 SS2_18_102_N7 SS2_18_102_N10 SS2_18_102_O1 SS2_18_102_O2 SS2_18_102_O9 SS2_18_102_O10 SS2_18_102_O3 SS2_18_102_O6 SS2_18_102_O4 SS2_18_102_O5 SS2_18_102_O7 SS2_18_102_O8 SS2_18_102_P7 SS2_18_102_P2 SS2_18_102_P8 SS2_18_102_P9 SS2_18_102_P1 SS2_18_102_P10 SS2_18_102_P5 SS2_18_102_P3 SS2_18_102_P6 SS2_18_102_P4 SS2_18_110_H21 SS2_18_110_H15 SS2_18_110_H13 SS2_18_110_H16 SS2_18_110_H20 SS2_18_110_H18 SS2_18_110_H14 SS2_18_110_H23 SS2_18_110_H24 SS2_18_110_H17 SS2_18_110_H19 SS2_18_110_H22 SS2_18_110_I14 SS2_18_110_I15 SS2_18_110_I16 SS2_18_110_I20 SS2_18_110_I17 SS2_18_110_I23 SS2_18_110_I13 SS2_18_110_I19 SS2_18_110_I18 SS2_18_110_I24 SS2_18_110_I22 SS2_18_110_I21 SS2_18_110_J14 SS2_18_110_J15 SS2_18_110_J19 SS2_18_110_J20 SS2_18_110_J16 SS2_18_110_J13 SS2_18_110_J24 SS2_18_110_J22 SS2_18_110_J18 SS2_18_110_J21 SS2_18_110_J17 SS2_18_110_J23 SS2_18_110_K13 SS2_18_110_K22 SS2_18_110_K21 SS2_18_110_K14 SS2_18_110_K19 SS2_18_110_K16 SS2_18_110_K24 SS2_18_110_K15 SS2_18_110_K20 SS2_18_110_K18 SS2_18_110_K17 SS2_18_110_K23 SS2_18_110_L21 SS2_18_110_L13 SS2_18_110_L20 SS2_18_110_L14 SS2_18_110_L24 SS2_18_110_L17 SS2_18_110_L15 SS2_18_110_L19 SS2_18_110_L22 SS2_18_110_L18 SS2_18_110_L23 SS2_18_110_L16 SS2_18_110_M13 SS2_18_110_M20 SS2_18_110_M17 SS2_18_110_M16 SS2_18_110_M14 SS2_18_110_M15 SS2_18_110_M24 SS2_18_110_M18 SS2_18_110_M21 SS2_18_110_N15 SS2_18_110_M19 SS2_18_110_M22 SS2_18_110_N14 SS2_18_110_M23 SS2_18_110_N17 SS2_18_110_N20 SS2_18_110_N19 SS2_18_110_N23 SS2_18_110_N18 SS2_18_110_N13 SS2_18_110_N24 SS2_18_110_N16 SS2_18_110_N21 SS2_18_110_N22 SS2_18_110_O13 SS2_18_110_O19 SS2_18_110_O14 SS2_18_110_O15 SS2_18_110_O20 SS2_18_110_O21 SS2_18_110_O23 SS2_18_110_O22 SS2_18_110_O16 SS2_18_110_O17 SS2_18_110_O18 SS2_18_110_O24 SS2_18_110_P15 SS2_18_110_P21 SS2_18_110_P14 SS2_18_110_P18 SS2_18_110_P17 SS2_18_110_P22 SS2_18_110_P24 SS2_18_110_P23 SS2_18_110_P13 SS2_18_110_P16 SS2_18_110_P20 SS2_18_110_P19
0	Adora1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0...
1	Sntg1 0 1 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0...
2	Prim2 2 15 7 22 0 24 9 7 71 7 27 74 4 14 4 0 3...
3	Bai3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
4	Cflar 0 0 0 0 0 0 0 1 4 0 0 0 0 2 3 0 7 1 3 0 ...
...	...
23714	Tmlhe 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
23715	Vamp7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
23716	Spry3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
23717	Zf12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...
23718	eGFP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...

23719 rows × 1 columns

Using only a few modifications to our pandas.read_csv() function, we can rearrange our data such that it can be made into a anndata object:

df = pd.read_csv("GSE129114_E9.5_anterior_Sox10_counts.txt", delimiter=" ", header=0, index_col=0)
df

	SS2_17_554_A21	SS2_17_554_A20	SS2_17_554_A9	SS2_17_554_A19	SS2_17_554_A24	SS2_17_554_A22	SS2_17_554_A11	SS2_17_554_A23	SS2_17_554_A14	SS2_17_554_A17	...	SS2_18_110_P14	SS2_18_110_P18	SS2_18_110_P17	SS2_18_110_P22	SS2_18_110_P24	SS2_18_110_P23	SS2_18_110_P13	SS2_18_110_P16	SS2_18_110_P20	SS2_18_110_P19
Adora1	0	0	0	0	0	0	0	0	0	0	...	21	0	0	0	0	0	0	0	0	0
Sntg1	0	1	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Prim2	2	15	7	22	0	24	9	7	71	7	...	0	12	25	0	0	0	39	44	27	9
Bai3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Cflar	0	0	0	0	0	0	0	1	4	0	...	8	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Tmlhe	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Vamp7	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Spry3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Zf12	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
eGFP	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

23719 rows × 524 columns

The final step is to create the anndata object:

obs_names = df.columns.values
var_names = df.index.values
x = df.to_numpy().T

adata = ad.AnnData(X = x)
adata.obs_names = obs_names
adata.var_names = var_names

print(adata)

AnnData object with n_obs × n_vars = 524 × 23719

Data from R

The other common format that you will find single-cell datasets in is formats specific to R (either SingleCellExperiment or Seurat objects), which tend to be saved in .RDS files. This situation is quite a bit more complex than the other two, and I would recommend just consulting someone who has experience converting these into formats into those amenable for analysis in python.

The anndata object

The active matrix (adata.X)

Cell- (adata.obs) and gene-level (adata.var) metadata

Additional metadata slots (adata.uns, adata.obsm, adata.varm, adata.obsp)