Install
To install the plugin, add the plugin into your nextflow.config
plugins {
id "nf-csvext@0.1.0"
}
Once installed the plugin provides several csv functions
General specifications
Most of the functions will follow the csv_xxxx
pattern, where xxxx
will describe the intent of the
function as for example csv_sort
will emit a sorted CSV by a specified column
Functions will accept a file
as first argument as the csv source
, except creation functions
Functions will accept an optional Map
as last arguments. In this map, you can provide details
about the operation. For example, you can provide the sep
entry to specify the separator, or
header
to indicate if the header needs to be processed
In almost all situations, functions need to store files in a temporary directory. For performance reasons, this directory is located in the machine’s local storage, and it should have enough free space. The tempDir option can be used to specify a different temporary directory.
csv_concat
Concat two (or more) CSV files.
If a header is specified, it validates these files have the same "header size" (but will not check if names are equals nor order)
Appended file(s) will not output their header
Arguments
-
source, a Path
-
appends, a Path or a List<Path>
-
optional Map as params
Params
key |
type |
description |
default |
header |
boolean |
if true concat compare files have the same numer fields in the header |
false |
sep |
string |
which char used as separator, for example ";" "\t" |
"," |
Example
include { csv_concat } from 'plugin/nf-csvext'
params.input = 'data/names.csv'
params.concat = 'data/concat_names.csv'
csv_file = file(params.input)
workflow {
Channel.fromPath( params.input)
| map{ source ->
csv_concat( source, file(params.concat), header:true )
}
| splitCsv(header:true)
| view
}
if params.input
and params.concat are both a csv with `name
, birthdate
, quote
, with one line everyone, them the output will be:
[name:Albert Einstein, birtdhay:14 de marzo de 1879, quote:"La imaginación es más importante que el conocimiento."]
[name:Rosa Parks, birthday:4 de febrero de 1913, quote:"Cada persona debe vivir su vida como un modelo para los demás."]
File collection
To append more than one file to source
use a List as second arguments:
csv_concat( source, [ file(params.concat), file(params.another_file) ], header:true )
csv_sort
Emit a "sorted" CSV by a column index (or name)
Arguments
-
source, a Path
-
map as params
- INFO
-
header parameter is implicit as it’s required to know which column uses
Params
key |
type |
description |
default |
column |
number or string |
specify the column name or index position (starting by 0) to use |
"0" |
sep |
string |
which char used as separator, for example ";" "\t" |
"," |
Example
workflow {
Channel.fromPath( params.input )
| map{ source ->
csv_sort( source, column: 2) //(1)
}
| splitCsv(header:true)
| view
}
-
Column can be an integer or a String. In this case, must be a column name present in the headers
csv_trim
Remove one or more columns from the CSV and produce a new CSV without these columns
Arguments
-
source, a Path
-
map as params
- INFO
-
header parameter is implicit as it’s required to know which column uses
Params
key |
type |
description |
default |
column |
number or string |
specify the column name or index position (starting by 0) to use |
"0" |
sep |
string |
which char used as separator, for example ";" "\t" |
"," |
Example
include { csv_trim } from 'plugin/nf-csvext'
params.trim = 'Cabin,Pclass'
workflow {
Channel.fromPath( 'https://raw.githubusercontent.com/incsteps/nf-csvext/refs/heads/main/validation/data/titanic.tsv' )
| map{ source ->
csv_trim( source, columns:params.trim, sep:'\t') //(1)
}
| splitCsv(header:true, sep:'\t')
| view
}
-
Use
column
to remove a single column
csv_prettyprint
Emit a "rewrote" CSV adjusted to max size per column
Arguments
-
source, a Path
-
map as params
- INFO
-
header parameter is implicit as it’s required to know which column uses
Params
key |
type |
description |
default |
sep |
string |
which char used as separator in input, for example ";" "\t" |
"," |
newSep |
string |
which char used as separator in output, for example ";" ";" |
sep |
Example
include { csv_prettyprint } from 'plugin/nf-csvext'
Channel.fromPath( 'https://raw.githubusercontent.com/incsteps/nf-csvext/refs/heads/main/validation/data/titanic.tsv' )
| map{ source ->
csv_prettyprint( source, sep:'\t', newSep:';')
}
| map{ source ->
file(source).text
}
| view
PassengerId;Survived;Pclass;Name ;Sex ;Age ;SibSp;Parch;Ticket ;Fare ;Cabin ;Embarked;
1 ;0 ;3 ;Braund, Mr. Owen Harris ;male ;22 ;1 ;0 ;A/5 21171 ;7.25 ;\N ;S ;
2 ;1 ;1 ;Cumings, Mrs. John Bradley (Florence Briggs Thayer) ;female;38 ;1 ;0 ;PC 17599 ;71.2833 ;C85 ;C ;
3 ;1 ;3 ;Heikkinen, Miss. Laina ;female;26 ;0 ;0 ;STON/O2. 3101282 ;7.925 ;\N ;S ;
4 ;1 ;1 ;Futrelle, Mrs. Jacques Heath (Lily May Peel) ;female;35 ;1 ;0 ;113803 ;53.1 ;C123 ;S ;
5 ;0 ;3 ;Allen, Mr. William Henry ;male ;35 ;0 ;0 ;373450 ;8.05 ;\N ;S ;
...
csv_create
A csv_create
operator is provided similar to collectFiles
. Once configured, every item received in the channel
can be transformed and them the operator will emit a CSV
Example
include { csv_create } from 'plugin/nf-csvext'
channel.fromList([
[id:1, name:'a name'],
[id:2, name:'b name'],
[id:3, name:'c name'],
])
.csv_create( headers:['name','id','date'], sep:";"){ sequence->
sequence['date'] = new Date().toString()
sequence
}
.view()
This example will produce a CSV with name
, id
and date
fields
As you can see, not only you can specify the order of the header but also modify every item using the closure. You can add, remove or transform every item
If the item consumed is a List, the item emitted will be a concatenation of the elements using the sep
If the item consumed is a Map, the item emitted will be a concatenation of the elements present in the map
specified in the header
. sep
will be used as separator