nf-csvext

Install

To install the plugin, add the plugin into your nextflow.config

nextflow.config

plugins {
    id "nf-csvext@0.1.0"
}

Once installed the plugin provides several csv functions

General specifications

Most of the functions will follow the csv_xxxx pattern, where xxxx will describe the intent of the function as for example csv_sort will emit a sorted CSV by a specified column

Functions will accept a file as first argument as the csv source , except creation functions

Functions will accept an optional Map as last arguments. In this map, you can provide details about the operation. For example, you can provide the sep entry to specify the separator, or header to indicate if the header needs to be processed

In almost all situations, functions need to store files in a temporary directory. For performance reasons, this directory is located in the machine’s local storage, and it should have enough free space. The tempDir option can be used to specify a different temporary directory.

csv_concat

Concat two (or more) CSV files.

If a header is specified, it validates these files have the same "header size" (but will not check if names are equals nor order)

Appended file(s) will not output their header

Arguments

source, a Path
appends, a Path or a List<Path>
optional Map as params

Params

key

type

description

default

header

boolean

if true concat compare files have the same numer fields in the header

false

sep

string

which char used as separator, for example ";" "\t"

","

Example

include { csv_concat } from 'plugin/nf-csvext'

params.input = 'data/names.csv'
params.concat = 'data/concat_names.csv'
csv_file = file(params.input)

workflow {

    Channel.fromPath( params.input)
        | map{ source ->
            csv_concat( source, file(params.concat), header:true )
        }
        | splitCsv(header:true)
        | view
}

if params.input and params.concat are both a csv with `name, birthdate, quote , with one line everyone, them the output will be:

[name:Albert Einstein, birtdhay:14 de marzo de 1879, quote:"La imaginación es más importante que el conocimiento."]
[name:Rosa Parks, birthday:4 de febrero de 1913, quote:"Cada persona debe vivir su vida como un modelo para los demás."]

File collection

To append more than one file to source use a List as second arguments:

csv_concat( source, [ file(params.concat), file(params.another_file) ], header:true )

csv_sort

Emit a "sorted" CSV by a column index (or name)

Arguments

source, a Path
map as params

INFO

header parameter is implicit as it’s required to know which column uses

Params

key

type

description

default

column

number or string

specify the column name or index position (starting by 0) to use

"0"

sep

string

which char used as separator, for example ";" "\t"

","

Example

workflow {
   Channel.fromPath( params.input )
        | map{ source ->
            csv_sort( source, column: 2) //(1)
        }
        | splitCsv(header:true)
        | view
}

Column can be an integer or a String. In this case, must be a column name present in the headers

csv_trim

Remove one or more columns from the CSV and produce a new CSV without these columns

Arguments

source, a Path
map as params

INFO

header parameter is implicit as it’s required to know which column uses

Params

key

type

description

default

column

number or string

specify the column name or index position (starting by 0) to use

"0"

sep

string

which char used as separator, for example ";" "\t"

","

Example

include { csv_trim } from 'plugin/nf-csvext'

params.trim = 'Cabin,Pclass'

workflow {
    Channel.fromPath( 'https://raw.githubusercontent.com/incsteps/nf-csvext/refs/heads/main/validation/data/titanic.tsv' )
        | map{ source ->
            csv_trim( source, columns:params.trim, sep:'\t') //(1)
        }
        | splitCsv(header:true, sep:'\t')
        | view
}

Use column to remove a single column

csv_prettyprint

Emit a "rewrote" CSV adjusted to max size per column

Arguments

source, a Path
map as params

INFO

header parameter is implicit as it’s required to know which column uses

Params

key

type

description

default

sep

string

which char used as separator in input, for example ";" "\t"

","

newSep

string

which char used as separator in output, for example ";" ";"

sep

Example

include { csv_prettyprint } from 'plugin/nf-csvext'


Channel.fromPath( 'https://raw.githubusercontent.com/incsteps/nf-csvext/refs/heads/main/validation/data/titanic.tsv' )
    | map{ source ->
        csv_prettyprint( source, sep:'\t', newSep:';')
    }
    | map{ source ->
        file(source).text
    }
    | view

output

PassengerId;Survived;Pclass;Name                                                                              ;Sex   ;Age ;SibSp;Parch;Ticket            ;Fare    ;Cabin          ;Embarked;
1          ;0       ;3     ;Braund, Mr. Owen Harris                                                           ;male  ;22  ;1    ;0    ;A/5 21171         ;7.25    ;\N             ;S       ;
2          ;1       ;1     ;Cumings, Mrs. John Bradley (Florence Briggs Thayer)                               ;female;38  ;1    ;0    ;PC 17599          ;71.2833 ;C85            ;C       ;
3          ;1       ;3     ;Heikkinen, Miss. Laina                                                            ;female;26  ;0    ;0    ;STON/O2. 3101282  ;7.925   ;\N             ;S       ;
4          ;1       ;1     ;Futrelle, Mrs. Jacques Heath (Lily May Peel)                                      ;female;35  ;1    ;0    ;113803            ;53.1    ;C123           ;S       ;
5          ;0       ;3     ;Allen, Mr. William Henry                                                          ;male  ;35  ;0    ;0    ;373450            ;8.05    ;\N             ;S       ;
...

csv_create

A csv_create operator is provided similar to collectFiles. Once configured, every item received in the channel can be transformed and them the operator will emit a CSV

Example

include { csv_create } from 'plugin/nf-csvext'

channel.fromList([
    [id:1, name:'a name'],
    [id:2, name:'b name'],
    [id:3, name:'c name'],
])
    .csv_create( headers:['name','id','date'], sep:";"){ sequence->
        sequence['date'] = new Date().toString()
        sequence
    }
    .view()

This example will produce a CSV with name, id and date fields

As you can see, not only you can specify the order of the header but also modify every item using the closure. You can add, remove or transform every item

If the item consumed is a List, the item emitted will be a concatenation of the elements using the sep

If the item consumed is a Map, the item emitted will be a concatenation of the elements present in the map specified in the header. sep will be used as separator