Genestack Platform overview

Clicking on your username (your email address) in the top right corner of the page will give you access to your profile and allows to sign out of the platform.

../_images/WP_profile.png

In this section you can change your name, password, the name of your organisation and your vendor ID.

../_images/profile.png

Organizations are a way of enforcing group permissions. There are two types of user in an organization – administrators and non-administrators. If you are in the same organization as another user, you can add them to groups you control and share files with them freely. If you are in different organizations, administrators from both organizations first need to approve adding them to the group. You can learn more about data sharing, permissions and groups in the Sharing data and collaboration section.

Vendor IDs are used for application development. Applications you have created will be marked with your vendor ID.

Tasks links to the Task Manager application, where you can monitor running and previous computations.

Data Browser allows you to browse through public, private and shared data with Data Browser application which allows you to search through the wealth of your data using complex queries.

Wherever you are on the platform, you can also access a shortcuts menu by clicking on the button in the top left corner of any platform page. It is an easy way to reach most commonly used applications and folders. Data Browser, Import Data, Manage Applications, Manage Groups, Expression Data Miner, Differential Expression Similarity Search, and Import Template Editor, as well as the folders for created and imported files can all be found here. You can also click the User Guide to access user documentation.

../_images/shortcuts.png

Let’s look deeper into each of these items.

Manage applications

../_images/manage_app.png

Here you can view the list of all applications available on the platform – both ones you have written as well as public ones.

The Developer button will give you the option to choose which version of an application you want to use.

../_images/developer_button.png

The ‘minified’ options optimizes loading of CSS and JS used in the application. You can find more details on minifying in blog post by Dino Esposito.

The Session and User dropdown menus allow you to chose the version of the application you want to use for your current log-in session and for your current user account respectively. Inherit is the default option and the order of version choice inheritance is Global → User → Session. If you change the version of an application, you also need to reload it to run the version of your choice.

Manage groups

../_images/manage_groups.png

In order to share data, we use groups. In the Manage Groups section you can change the settings of your current collaboration groups or create new ones and invite other users to join. You can also view and accept all the invitations you have received from other users. Read more about collaboration on Genestack in the Sharing Data and Collaboration section.

Manage users

In this section, you can change password of your users or create new users. If you click on  Manage Users you will go to the user management screen. Every user in Genestack Platform belongs to an organisation. When you signed up to use Genestack via the sign up dialog, we created a new organisation for you, and you have automatically become its first user and its administrator. As an organisation administrator you can create as many  new users for your organisation as you want. For instance, you can create accounts for your colleagues. Being in one organisation means you can share data without any restrictions. The user management screen allows you to get an overview of all users in your organisation. You can change a user’s password, make any user an administrator or lock a user out of the system.

../_images/first-user.png

You can also create new users. Let’s create a Second User by clicking the  Create user button.

../_images/second-user.png

You will need to set the user name, email and password. Users added this way are immediately confirmed, and can log in right away.

You can find more about managing users on Genestack from this video.

../_images/manage_users.png

Importing data

Here is a list of file types that can be imported into Genestack. Note that gzippped (.gz) and zipped (.zip) files are also supported.

Genestack file type Description Supported file formats
Continuous Genomic Data
Contains information on continuous
genome statistics, e.g. GC% content
WIGGLE
WIG
Discrete Genomic Data
Information on discrete regions of
the genome with an exact start and
end position
BED
Gene Expression
Signature
The file includes the list of genes
and expression pattern (Log FC)
specific to an organism phenotype
with possibly additional annotatio
Gene List
Stores a list of genes with possibly
additional annotation


Gene Signature Database
A list of annotated gene sets, that
can be used in enrichment analysis
GMT
Infinium Methylation
Beta Values
Methylation data matrices contained
Beta-values methylation ratios for
Illumina Infinium Microarrays
TSV
TXT
Infinium Microarray Data
Raw intensity data files for
Illumina Infinium Microarrays
IDAT
Mapped Reads
Reads aligned to a specific
reference genome
BAM
CRAM
Methylation Array
Annotation
Methylation chip annotation
containing information about
association of microarray probe
to known genes
TSV
Microarray Annotation
Annotation file containing
information about association of
microarray probes to biological
entities like genes, transcripts and
proteins
TXT
CSV
Microarray Data
Raw microarray data obtained from
a microarray experiment

CEL (Affymetrix)
TXT (Agilent)
GPR (GenePix)
Ontology Files
Files used to annotate metadata


OWL
OBO
CSV
Raw Reads
Raw sequencing data


FASTQ
SRA
FASTA+QUAL
Reference Genomes
Reference genome sequence for a
specific organism with annotation

FASTA + GFF
FASTA + GTF
FASTA + GFF3
Variation Files
Genetic variations files, storing
gene sequence variations
VCF

Note

Import of Gene Expression Signature and Gene List files

If the file contains both gene names and log fold changes, it is imported as Gene Expression Signature. If the file only contains gene names, it is imported as Gene List. The importer will look at the headers of the .tsv file to try to detect which columns may correspond to gene names or log fold changes (common variations are supported such as ‘gene’/‘symbol’ for gene names, and ‘logFC’/’log fold change’ for log fold changes). If it fails to detect them, the user will be asked to manually choose the file type and specify the file headers corresponding to gene names or log fold changes. Gene symbols and Ensembl/Entrez gene IDs are currently supported for gene names.

When you import files that are detected as raw sequencing or microarray data, Genestack automatically creates a dataset, a special type of folder, and adds the assays to it. Additional documents in any format (e.g. PDF, Word, text, etc.) can be imported as attachments to a dataset. We will discuss the use of attachments below. Some types of files, namely Reference Genome, Gene List, Gene Expression Signature, Gene Signature Database, Genetic Variations, Ontology Files, Dictionary, Microarray Annotation, Methylation Array Annotation, Infinium Beta Values, are not wrapped in datasets on import because they are rarely uploaded and processed as batches.

When you perform any analysis on Genestack, other data types, which cannot be imported, can be created such as:

  • Affymetrix/Agilent/GenePix Microarrays Normalisation — file with normalized Affymetrix/Agilent/GenePix microarrays data;
  • Differential Expression Statistics — expression statistics for change in expression of individual genes or other genomic features between groups of samples, such as fold-changes, p-values, FDR, etc.;
  • Genome Annotations — a technical file used for matching GO terms and gene symbols to gene coordinates;
  • Mapped Read Counts — file is produced from Mapped Reads and contains the number of reads mapped to each feature of a reference sequence.

There are several ways you can access the Import application:

  • using the Import data link on the Dashboard;
../_images/WP_import.png
  • clicking the Import button in the File Manager;
../_images/FM_import.png
  • using an import template. We will describe what import template is and how to use it later in the guide.
../_images/IT_import.png

Import data consists of three steps: firstly, temporary Upload files with your data are created in the platform; then, the biological data type is assigned to your imported data; finally, you can fill in all required metadata or import it from a text file.

Step 1: Getting data into the platform

There are two ways to have your data imported into the platform:

  1. Upload data from your computer — select or drag-and-drop files.
../_images/import_start.png
  1. Import from URLs (FTP or HTTP/HTTPS) — specify URLs for separate files or directories.
../_images/URL_import.png

Furthermore, you can reuse your previous Upload files instead of uploading the same data again: just select existing files with the Use previous uploads option and, then, add more data if necessary. This feature can be useful, for example, when you import a dataset with several samples, one of the files is chosen incorrectly or corrupted, so you would like to replace it. In this case, you need to upload again just one sample and reuse all other previously uploaded files.

Note

What is an Upload file?

The Upload file is a temporary file that is automatically created during the data importing process. The only purpose of the Upload files is to temporarily store the data until the corresponding Genestack files are created and initialized correctly. It is Genestack files that will be further used in bioinformatic data analysis; that is why the platform periodically can remove the Upload files, but no data is lost.

Data uploading from your computer is carried out in multiple streams to increase upload speed. Import from URLs is performed in the background, which means that even while these files are being uploaded, you can edit their metadata and use them in pipelines.

../_images/uploading_step.png

If during uploading you lose your Internet connection, you will be able to resume unfinished uploads later.

../_images/resumed_uploads.png

Click the Import files button to proceed.

Step 2: Format recognition

After your data is uploaded, Genestack automatically recognizes file formats and transforms them into biological data types: raw reads, mapped reads, reference genomes, etc. All format conversions will be handled internally by Genestack. You will not have to worry about formats at all.

../_images/file_recognition.png

If files are unrecognized or recognized incorrectly, you can manually allocate them to a specific data type: drag the Upload file and move it to the green “Choose type” box at the top of the page.

../_images/unrecognized_uploads.png

Choose the data type you find suitable:

../_images/file_types_box.png

Click the Create files button to proceed.

Step 3: Editing metainfo

During this step, the import has already completed, and you can describe uploaded data using an Excel-like spreadsheet.

../_images/import_edit_metainfo.png

By default, you see all metainfo fields available for files, you can fill them or create new custom columns. Click the Add column button, name new metainfo field and choose its type (Text, Integer, etc.):

../_images/add_metainfo_field.png

You can also choose to apply a naming scheme. This allows you to generate file names automatically based on other metainfo attributes.

../_images/naming_scheme.png

Metainfo fields can be associated with specific dictionaries and ontologies. We pre-uploaded some public dictionaries such as the NCBI Taxonomy database for the “Organism” field, the Cellosaurus (a resource on cell lines), the ChEBI for chemical compounds, and the Cell Ontology (cell types in animals).

We also created our own controlled vocabularies to cover Sex, Method and Platform fields. You can find out more about ontologies in the Managing metadata section.

Import with templates

You can create your own custom dictionary by importing it into the platform as OWL, OBO or CSV file and attach it to the import template.

Note

What is an import template?

Import templates allow you to select what metainfo attributes of your imported files will be tightly controlled (so you don’t lose any information in the process). Import templates allow you to set default fields for file metadata based on file type (e.g. Datasets, Discrete Genomic Data, Genetic Variations, etc.). Of course, if you’re only importing mapped reads, you don’t need to specify metainfo attributes for other data types.

You can select which import template to use in two ways: from the Dashboard, or during the 3rd step of the import process by right-clicking on the import template name (“Default template” is for the public one). You can create a copy of existing import templates with Make a copy option in the context menu.

../_images/copy-import-template.png

Genestack will attempt to fill metainfo fields automatically, but you can always edit the contents manually during the import process. By using metainfo templates you can make sure that all of your files will be adequately and consistently described so you will not lose any valuable information. For example, here is the list of metainfo attributes used by default to describe Reference Genome data:

../_images/default_import_template.png

Import template editor application allows to modify existing import templates and create new ones with proper metainfo fields, requirements and controlled vocabularies. To access the application right-click on a template’s name and select the Import template editor from the “Manage” submenu. To create new template on the basis of the default one you can also click Add import template one the Dashboard.

../_images/import_templates.png

Now let’s say you wish to create an import template to control the metainfo attributes of raw reads (e.g. you always need to know the tissue and sex of your samples). In order to do this, click on Add import template, then look for the table related to Raw Reads and for the fields “tissue” and “sex”, change the required fields to Yes. As you can see, the system controls what type of information can you put into your metainfo fields. In this case, for tissue the system will map your entries to the Uberon ontology (an integrative multi-species anatomy ontology) and the metainfo type must be text.

../_images/edit-template.png

If you want to add other metainfo fields that are not included in the table already, you can do this at the bottom of the table where there are blank spaces. For each entry, you must specify whether or not this field is required and what is its metainfo type (e.g. text, yes/no, integer).

../_images/metainfo_type_editor.png

If you are using a file kind that is not yet listed, you can add a new one by clicking on the Add file kind button. Keep in mind that file kinds are defined in Genestack — you will not be able to create a template entry for a file kind that is not used on the platform.

When you are done, click on the blue Import using this template button. This will take you to the Import Data app, where you can go through the three import steps described above. You can find all the imported files in the “Imported” folder which can be accessed from the Dashboard and from the File Manager.

Metadata import

Apart from editing metainformation manually, you can also import and validate the metainfo attached to the assays and to the dataset on the platform.

../_images/import_from_spreadsheet.png

Click Import data from spreadsheet button and select a local CSV or Excel file containing metadata you would like to associate with the imported files.

../_images/import_metainfo.png

Note that names in the first column in the file with metadata should exactly match names of the data samples on the platform, based on the first “Name” column. For example, in our case metainfo for the second sample does not match to any assays and is highlighted in red.

../_images/import_metainfo_table_red.png

Use the Select file option to manually allocate the imported metadata to an appropriate file.

../_images/import_metainfo-select-file.png

Columns that are mapped to a metainfo field from the dataset’s template (by default data are imported with “Default” template) are highlighted in green.

../_images/import_metainfo_table-green.png

On this step for each column you can specify whether it should be imported or not, and if it should be mapped to some metainfo key from the import template, by clicking on the column header.

../_images/metainfo-import-matching.png

Click Import when you finish editing the table. As a result, the table on the Metainfo Editor page is filled in with metadata from the Excel-file.

../_images/import_metainfo_complete.png

Attachments

While importing a dataset into Genestack, you can also attach various files to it such as, for example, a PDF file with the experiment plan or an R script, etc. When you open your newly-imported datasets, all of the attachments will accompany it. They will be safely stored on Genestack, so later you can download them from the platform, in case they get lost on your computer.

How to upload an attachment?

Attachments should be uploaded together with the dataset. In the Data Import application, choose the attachments from your computer along with your dataset. The platform will recognize the raw data, and all additional files that were unrecognised will be added to the dataset as attachments.

../_images/attachments.png

Besides, you can upload more attachments, or remove attachments in the Metainfo Editor.

../_images/exp_attachments.png

Browsing data

Efficient data search and browsing are at the core of Genestack. The platform provides rapid access to private, shared, and public data and analysis results.

Data browser

Our platform provides you with a rich collection of freely accessible datasets that we imported from various well-known repositories, such as GEO NCBI, ENA, SRA and Array Express. Data is synchronized regularly from these databases, keeping things up-to-date. There are currently more than 3 million sequencing and microarray assays from over 100,000 public datasets indexed in Genestack. All the public datasets and assays are accompanied by original metainformation describing biological data. Generally, this information is not standardized that makes operations with biological data, like browsing data and combining assays from several datasets or reproducing some analysis, difficult or even impossible without human participation. To harmonize raw metadata we apply automated curation where we map raw entries to controlled terms that we store and maintain in special files called Dictionaries. To prepare these Dictionaries we adopted terms from external ontologies or created them manually. You can also use our standardized and unified terminology to describe your own data or analysis results.

The Data Browser allows to browse these public datasets, as well as your private data and the data shared with you on Genestack. You can access the Data Browser either from the Dashboard or the Shortcuts menu on the left-hand side.

You can search relevant data with a free-text query, and you can further filter down datasets by metadata attributes using the checkboxes on the left. These attributes are generated based on the metadata associated with datasets. For instance, you can set the filters “Access”, “Method” and “Organism” to “Public”, “Whole Exome Sequencing”, “Mus musculus”, respectively, to filter out publicly accessible data on mice obtained from mouse WES data.

../_images/data-browser.png

Data Browser allows you to find bioinformatics analyses results associated with raw data. If there are analysis performed on a given dataset, and you have access to these results (i.e. they are yours, or they are shared with you), then you will find both intermediate results and reports in the column Downstream.

../_images/analysis-results.png

Then, you can merge data from several datasets into a single combined dataset or share several datasets with your collaborator together. To do so you should select several datasets and choose on a “Briefcase bar” that appears at the bottom of the screen Merge… button or Share… button, respectively.

../_images/data-browser-combine.png

If not all the samples meet your searching criteria, feel free to create a subset of a dataset with matching samples and process them separately. To do so, click a link showing the number of matching files in the Data Browser column Matched, then, click Make a subset with matching files button to save files matching to the set filters. You can also make a subset on the Metainfo Editor page.

../_images/subset-in-databrowser.png

Clicking on the name of any of the datasets will take you to the Metainfo Editor, where you can view (and possibly edit) the metadata of this dataset and its assays.

../_images/metainfo-editor.png

Besides,directly from the Metainfo Editor page you can start building pipeline step by step via the button Analyse.

../_images/new-df.png

If you want to analyse some part of your dataset, select samples and click the Make a subset button (by default all subsets are created in the folder My datasets).

../_images/make-subset-ME.png

Click a subset name to open it with Metainfo Editor application and edit its metainformation if it is needed.

../_images/subset-edit-metainfo.png

If you are an owner of a given dataset, you can add more samples to your dataset by clicking on Upload more files button.

../_images/dataset-upload-more.png

Besides, you can remove files from a dataset: select files you want to exclude and click Remove files from dataset button.

../_images/dataset-remove-1.png

And if you are sure, confirm removing of the data by click Remove button. Remember that if the files you are going to exclude from a dataset are not used anywhere, they will be deleted from the platform without any possibility to restore.

../_images/dataset-remove-2.png

If your dataset is made from subsets of other datasets, use metainfo filters in File Provenance. Open the dataset in File Provenance to see based on which metadata samples were selected, and, therefore, you can be sure that no significant data was omitted.

../_images/metainfo-filter.png

File manager

Like on any operating system, the File Manager is where you can easily access all of your files, organise them into folders and open them with various applications.

../_images/file-manager.png

The panel (tree view) on the left-hand side is the file system navigator. Here you can see many different folders. Some special folders are worth mentioning:

Created files is the folder where any new file created by an application on Genestack goes.

Imported files is where imported data goes, organized by date: all files imported at the same time (during one import action) will be located in the same folder.

Uploads contains all the files you have uploaded into Genestack — FASTQ and BAM files, pdf documents, excel tables etc.

Note

What is the difference between uploads and imported files?

When you have just started importing your files (in various formats like FASTQ, BAM etc.), they all go to the specific storage area (the “Uploads” folder). During import, Genestack will recognize these uploaded files and allocate them to appropriate biological types (you can also do it manually), e.g. sequencing assays, mapped reads etc. These meaningful biological objects are what you work with on Genestack, and these are located in the “Imported files” folder.

The Exports folder contains data ready for export. See the Data export section for more information.

Shared with me give access to all files that other users have shared with you or that you shared with other users. See the Sharing data and collaboration section for more details.

Public Data folder contains all of the goodies we have preloaded on Genestack to make life a bit simpler for our users. This folder contains:

../_images/public-data.png
  1. Codon tables: currently 18 different tables such as yeast mitochondrial, vertebrate mitochondrial, blepharisma macronuclear etc.;
  2. Default template: is an import template thai is used by default in data importing process. It provides the list of optional and required metadata fields for each file kind. An ontology or a dictionary can be associated with metadata keys to validate metainfo;
  3. Dictionaries: dictionaries include terms from external ontologies and are used to curate and harmonize metainfo, e.g. sex, platform, NCBI taxonomy.
  4. Example results: so you can play around with our platform and see what types of visualizations are available;
  5. External databases: sets of sequences with associated annotation, e.g. Greengenes for 16S rRNA;
  6. Genome annotations: for a range of different organisms and platforms (for WES analysis);
  7. Microarray annotations: annotation lists to be used as the translation table to link probes and common public domain sequences;
  8. Public analyses: all files created during re-analysis of previously published datasets;
  9. Reference genomes: various reference genomes for the most commonly analysed organisms;
  10. Public data flows: all data flows available to our users, including tutorial data flows and the ones found on the Dashboard;
  11. Public experiments: this is a feature we are particularly proud of have pre-loaded the platform with thousands and thousands of publicly available datasets, from public repositories such as GEO, ArrayExpress, SRA, and ENA. Currently, we have more than 110,000 datasets in our database.
  12. Tutorials: the folder contains files we use as examples during various tutorials.

To access the context menu for a given file, you can either do a right or left click on the respective entry in the file browser. The topmost entry is the application that was used to generate this file or the application that should be used to view it. The next four entries are submenus for each of the four different types of applications that can be used on the file. Further down are options for viewing and re-using the pipeline used to generate the file. The final section allows you to manage file locations and names. For folders, left-clicking opens the folder, while right-clicking opens the menu. The Add to and Move to action allow you to link or move a file to a chosen directory.

Note

This does not perform a copy

We use the word “linking” and not “copying” in this context because in Genestack, adding a file to a folder does not physically create a duplicate of that file (unlike copy-pasting in your traditional operating system). It just adds a link to that file from the folder (similar to symbolic links on UNIX).

Show all parent containers shows you a list of all the folders in which the current file is linked. The file accession is a unique identifier attached to each file. Unlike other metainfo attributes, it will never change for any file.

../_images/parent-containers.png

Above the File Manager pane, you can find the Import button. Clicking it takes you to the Import application page, where you can upload your files, import them into the platform and edit their metainfo.

../_images/import-button.png

Next to the Import button, you can see a New Folder button. Using it you will be able to create a new folder wherever you want. Another option — New folder with selection — appears when you have selected files and want to put all of them in a separate folder.

../_images/new-folder.png

The Preprocess, Analyse, Explore and Manage menus at the top of the page correspond to the four groups of applications that can be used to process and view data. These menus will become available when you select a file.

../_images/matching-apps.png

When you choose a file, the system will suggest applications which can work with the specific file type (e.g. sequencing assay). However, you still need to think about the nature of the data. For instance, if you want to align a raw WGBS sequencing assay, Genestack will suggest several mappers, but only the Bisulfite Sequencing Mapping application will be suitable in this case. To figure out what applications are recommended to process WGBS, WES, RNA-Seq or other sequencing data, go to the Applications review section of this guide.

File search in the top-right corner allows you to search for files by metadata (names, organism, method). To limit the search by file type or whether or not the file is shared with you, click on the arrow inside the search box.

../_images/file-search.png

Below the search box is a button to access your briefcase. Your briefcase is a place where you can temporarily store files from various folders. To delete an item from your briefcase hover over it and click on the “x” button. To clear all items from the briefcase, select the “Clear all” option.

../_images/brief-case-1.png

To add files to your briefcase, hover over each individual file and use the special “briefcase” button or select several files, right-click on them and choose “Add to briefcase…”.

../_images/brief-case-2.png

If you select a file, three additional buttons will show up, allowing you to share, delete the file or view metainfo (an “eye”-icon) for the file.

../_images/3buttons.png

Use the Share button to share your data with colleagues (the share button will not be available if you are using a guest account). Read more about sharing on Genestack in the section Sharing data and collaboration.

../_images/share.png

The Delete button allows you to remove your files from the system.

../_images/delete.png

The View metainfo button gives you more information about the file: technical (file type, its owner, when the file was created and modified, etc.), biological (e.g. cell line, cell type, organism, etc.), and file permissions.

../_images/eye.png

Managing metadata

Metainfo Editor application enable you to explore metadata for datasets or standalone files. Besides, if you have enough permissions, you can edit metadata or import it from spreadsheet in .xls, .xlsx, .csv formats. You can access Metainfo Editor from anywhere in the platform via the context menu. Moreover, metadata editing is the last step in the data importing process (see Import section for more information). Metadata of the files are shown in Excel-like tables where columns represent metainfo fields, such as ‘Organism’, ‘Cell line’ or ‘Platform’.

../_images/metainfo-editor.png

Edit metadata manually

By default a metainfo data table is based on Default Import Template that, however, you can easily replace with a custom one (learn more about templates in the section Importing data). To do so click on the template’s name, select Change template, and select the template you want in the pop-up window.

../_images/change-template.png

When you start typing in the corresponding cell, you will be suggested with terms from our controlled dictionaries if possible. You are free to enter any values, however we encourage you to use our standartized terminology, that helps you to avoid typos and harmonise metadata.

../_images/tissue-dict.png

Furthermore, you can add several terms to one metadata field for each file. To do so enter the first term as usual, click the button Add another and either add one of the existing fields or create your own one (i.e. custom key).

../_images/add-attribute.png ../_images/add-attribute-1.png

If you create new metadata field, you also need to specify its type: for example, for free-text values you should select “Text”, and for numeric value you should use “Integer” or “Decimal” one.

../_images/custom-key.png

Click column name to sort metadata or delete the selected column if needed.

../_images/sort.png

Import metainfo data from your computer

To begin, click the Import data from spreadsheet button. Then, choose a CSV, XLS or XLSX file with metadata that you would like to attach.

../_images/from-spreadsheet-1.png

Make sure that names of samples in the imported file are the same as the ones shown in the column “Name” in Metainfo Editor application. Otherwise, all not matching information in the imported file will not be imported. It will be marked in red, so you could easily fix it by clicking on “Select file” link.

../_images/from-spreadsheet-2.png

During metadata import process you can also decide whether a column should be imported and associate it with another metadata field by click on the name of the column.

../_images/from-spreadsheet-3.png

Compose file names using metainfo keys

When you complete describing your samples, you can use the metadata to name them. Click Apply naming scheme button and select metainfo fields that you want to use to create names.

../_images/naming-scheme.png

Make a subset

If you just want to analyse some samples from a dataset, you can make a subset. There are two ways of making subsets: select samples you want to analyse using checkboxes, and click Make a subset; or you can open metainfo summary and specify metainfo values that will be used as a rule to create a subset and filter out all non-matching files.

../_images/metainfo-summary.png

Once you are happy with the metadata for your files, you can proceed to analyse them by clicking the button Use dataset. You can use the suggested visualize applications to explore your files, like “FastQC Report” to check the quality of raw reads, use on of the existing public data flows or build your own pipeline by adding applications step-by-step. Moreover, you could share the files with your collaborators and add them to a folder of your choice.

../_images/run-df-from-me.png

Sharing data and collaboration

Access control model

There are three concepts around access control in Genestack: users, groups and organisations. Each user belongs to a single organisation (typically corresponding to the user’s company or institution, or a specific team within the institution). Organisations have two types of users: regular users and administrators, who have the right to add new users, and deactivate existing ones.

To check which organisation you belong to, you can go to the Profile page, accessible via the menu which opens when you click on your email address at the top-right corner of any page.

../_images/profile_menu.png

Managing users

If you are an administrator of your organisation, the shortcut menu appeared when click on the Genestack logo will also have an additional item, Manage Users, which takes you to the organisation’s user management page.

../_images/shortcuts_manage_users.png

From there, administrators can add or disable users, and reset passwords.

../_images/pr_manage_users.png

Sharing in Genestack is done through groups: every user can create any number of groups, and add other users to them. Each file in the system can be shared with any number of groups, who are granted different permissions (read-only, read and write, etc.).

Managing groups

To manage your groups, click on the Genestack logo at the top-left corner of any screen and select Manage groups.

../_images/shortcuts_manage_groups.png

From there, you can create groups using the Create group button, add or remove people from groups, and change users’ privileges within groups. By default, you will be a group administrator of any group that is created by your user.

../_images/manage_create_groups.png

If you are an administrator of a group, you can click the Add member button to add people to a group. From there you will be prompted for the e-mail of the user you want to add. If they are in your organisation, you will be provided with autocomplete.

../_images/group_add_member.png

Note

Can I add users from other organisations?

You can also add users from other organisations to a group (“cross-organisation group”). However, in that case, every user invitation will need to be approved by an organisation administrator of both your organisation and the other user’s organisation.

Once you have added a user from your organisation to the newly created group, you will also be able to set up their permissions within the group. Within a group, a user can be:

  • Non-sharing user (can only view data shared with the group);
  • Sharing user (can view data shared with the group, and share data);
  • Group administrator (all of the above, and can add/remove users to the group and change users’ privileges).

By default, newly added users will be granted the lowest permission level (Non-sharing user). You can change that using the drop-down next to their name.

../_images/users_permissions.png

Sharing files with a group

If you are a sharing user or an administrator of a group, you can share files with that group. Any file created on Genestack can be shared.

To share a file, you can click the file name and select the Share option in the context menu. Besides, some apps, such as Data Browser, Metainfo Editor or File Manager, have a special Share button.

../_images/sharing_experiment.png

From there, you will be taken to the file sharing dialog, which asks you to select a group to share the file with. By default, files are shared with read-only permissions (both for data and metadata). But you have the option of giving members the ability to edit the files in addition to just viewing them.

../_images/sharing_dialog.png

Once you click the blue Share button, you will be asked whether you would like to link the file into the group’s shared folder.

../_images/sharing_with_link.png

If you link the file into that folder, it will be visible to the group’s users when they open that folder (which can make it easier for them to find it). If you click “No”, the file will not be linked into the group folder but the group’s users will still be able to find the file through the File Search box (for instance, if you tell them the accession of the file), in File Provenance and through the Data Browser.

Each group has an associated group folder which you can access from the File Manager under “Shared with me” in the left-hand side panel.

../_images/shared_with_me.png

All files you share with other people, along with all files shared with you, will be located in that folder.

It is also possible to share files directly from the application pages; for example to share FastQC Report with your collaborators you should click the QC-report name and select Share option in the drop-down list.

../_images/share-in-bioapp.png

Besides, you can share your datasets from the Metainfo Editor page with the Share button.

../_images/share-in-ME.png

Building pipelines

Bioinformatic data analysis includes several steps, these steps vary depending on the type of the data and your goals. For instance, WGS data analysis includes the following steps: check the initial quality of raw reads, preprocessing of the data to improve the quality, if it is needed, and alignment the reads onto a reference genome followed by identification and annotation of genetic variants.

With Genestack you can either use one of the dataflows or build a pipeline manually selecting customizable applications supported by the system.

Use the Data Browser to find a dataset you would like to analyse, click on it. Then, on the Metainfo Editor page click on the button marked Analyse to start creating a pipeline. If you want to analyse not the entire dataset but some part of it, select the assays you wish to analyse and Make a subset.

So, select the first application you wish to see in your pipeline. For each individual file the system suggests only applications that can be used to analyse your data, considering its type and metadata.

Applications on the platform are divided in several categories:

  • Preprocess to prepare the data for actual analysis;
  • Analyse perform various kinds of analysis;
  • Explore to visualise QC check or analysis results;
  • Manage to operate with your files.
../_images/pipeline_building.png

This will take you to the application page where you can:

  • learn more about the application;
  • view and edit application parameters;
  • explore your results;
  • add further steps to the file data flow (the pipeline).
../_images/cla_page.png

To proceed click on Add step button that will show you the list of all the matching applications.

../_images/cla-add-step.png

Continue adding steps until you have finished building your pipeline. When you add each of the steps, you create new files which end up in the Created files and My datasets folders. However, these files are not yet ready to use — they need to be initialized first.

Reproducing your work

For any datasets in the system, you can learn where the data came from and replay the same exact analysis on other data.

  • File Provenance

    File Provenance application allows you to explore the history of data, to learn how a given dataset was generated. Click the New folder with files button makes a folder where all the files used in the pipeline are located.

../_images/dataset-provenance.png

View the text description of the pipeline including all the steps. Click View as text button to see which applications, parameters and tools were used at each step of the analysis.

../_images/view_as_text.png
  • Data Flow Editor

    If you want to reuse the same pipeline on different data, you can create the data flow identical to the pipeline used to create the original file, by selecting the file of interest and choosing Create new Data Flow from the available “Manage” applications. This will open Data Flow Editor application that gives a visual representation of the pipeline and allows you to choose your input files, such as raw reads and a reference genome. We would like to highlight here also that a range of public reference genomes have already imported from Ensembl and readily available on the platform. To add new inputs to the created data flow click choose sources. At this stage, no files have been created nor initialized.

../_images/data-flow-editor-2.png

Click Run dataflow button to continue, it will take you to the Data Flow Runner application.

  • Data Flow Runner

    Data Flow Runner application allows you to run the pipeline. Click Run dataflow button to create all the relevant files in an uninitialized state. A separate file is created for each individual input file at every step of analysis. You can find them in a separate folder in the “Created files” folder.

../_images/data-flow-runner-1.png

When the files are created, you will be suggested to either start initialization right away or delay it till later. You can check and change parameters if needed only before computations started. To do so, click application name in the corresponding node of the data flow. However just as initialization process started, any changes of files are forbidden.

../_images/data-flow-runner-3.png

Finally, whether you decide to start the computation or not, you will be suggested with a list of matching applications to explore results or continue analysis.

../_images/data-flow-runner-4.png

Public data flows

On our platform, you can find a range of public data flows we have prepared for our users. We cover most of the common analysis types:

  • Whole Genome Methylation Analysis
  • Whole Exome Sequencing Analysis
  • Single-cell Transcriptomic Analysis
  • Prediction of Genetic Variants Effects
  • Isoform Expression Statistics
  • Genetic Variation Analysis
  • Gene Expression Statistics
  • Targeted Sequencing Quality Control
  • Raw Reads Quality Control
  • Mapped Reads Quality Control
  • Agilent Microarray Quality Control
  • Affymetrix Microarray Quality Control
  • Unspliced Mapping
  • Spliced Mapping

Clicking on the data flow will take you to the Data Flow Runner where you can add source files and a reference genome. When you have chosen your files, click on the button marked as Run data flow. If you do not want to change any settings, you can click Start initialization now. To tweak the parameters and settings of the applications, select Delay initialization till later. To change the settings, click on the name of the application in the data flow. This will take you to the application page, where you can select Edit parameters and introduce your changes. When you are happy with parameters, go back to the data flow and start initialization.

Initialising files

You can initialize files in different ways:

  1. Using the Start initialization option in the context menu.

For instance, click on the name of the created dataset at the top of the application page and select Start initialization.

../_images/start_initialization.png
  1. Clicking Start initialization now in the Data Flow Runner application.

If you want to save the pipeline and specific parameters you used in your pipeline to re-use again on other files, you can create a new data flow with the Data Flow Editor app.

../_images/data_flow_editor.png

To proceed, сlick on the Run dataflow button and create all the relevant files for each app in the pipeline. This will take you to the Data Flow Runner page where you can check or change parameters of the applications by click app name and, then, initialize the computations with the Run Data Flow button in the last cell.

../_images/run_data_flow.png

Choose the Start initialization now option if you would like to run the computations immediately or Delay initialization till later.

../_images/start_initialization_now.png

This data flow, along with all your results (after computations are finished) will be stored in the Created files folder.

  1. Using File initializer application.

Select the data you are interested in, right click on them, choose the File Initializer in the Manage section.

../_images/file_initializer_df.png

The File Initializer reports the status of the files and allows you to initialize those that need to be by clicking on their respective the Go! buttons, or Initialize all to do them all at once. Files do not need to be produced by the same applications to be initialized together.

../_images/file_initializer.png
  1. Using the Start initialization button in the File Provenance.

Alternatively, you can click on the name of the last created file, go to Manage and choose the File Provenance application. The application displays the pipeline and also allows you to run the computation using the Start initialization button. Doing this will begin initialization of all the files (including intermediate files) you have created whilst building this pipeline.

../_images/file_provenance_init.png

Regardless the way you start initialization, you can track the progress of tasks in the Task Manager.

Task manager

In the top-right corner of any page on Genestack, you can see a link called Tasks. It will take you to the Task Manager, an application which allows you to track the progress of your computations.

../_images/task-manager.png

Besides, tasks can be sorted and filtered by application name, file name, accession, status, a user who started tasks, last update and elapsed time.

Statuses in the Task Manager help you keep track of your tasks. Let’s look what each status means:

  • Starting — the computation process has started to run;
  • Done — the task has finished successfully;
  • Failed — the computation has failed. To find out why click on View logs;
  • Queued — the task is pending for execution, for example when it is waiting for dependencies to complete initialization;
  • Running — your task is in progress;
  • Blocked by dependency failure — the computation cannot be completed because a task on which this one depends has failed;
  • Killed — the task has been canceled by the user.

You can also view output and error logs produced for each task. Error logs tell you why a task has failed. Output logs contain information about the exact details of what Genestack does with your files during the computation process, what specific tools and parameters are used, and so on. If the computations finished successfully, error logs will be empty, but the logs can provide you with some basic information about the output data.

../_images/task-log.png

If you change your mind about a computation after it has started, remember that you can kill tasks whenever you want by clicking the Cancel button, next to the task status. To rerun an analysis click file name and select Restart initialization.

Data export

Genestack provides secure data storage and Export Data application allows to safely download both assays and analysis results together with attached metadata to a local machine.

Select those files you are going to export, right-click on them and choose Export Data application. On the application page you will see the status of your files, and if some of them are not initialized, you will be suggested to initialize them prior to export.

../_images/export1.png

If you change your mind, you can stop exporting process by click on Cancel button.

../_images/export2.png

The application creates a temporary Export file that contains a special link to download the selected files. All the Export files are stored in the “Exports” folder.

../_images/export3.png

Sharing the link enables your collaborators to download data even if they do not have a Genestack account. The created export file can be removed after some time by the platform. It means that the corresponding download link will not be accessible any longer, however the data itself will not be affected.