|
24sevenZed
~Z3D3Z~



Registered: 11/05/16
Posts: 90
Loc: CYBERSPACE
Last seen: 3 years, 8 months
|
biopython shenanigans : P. Cubensis data from NCBI 2
#26675413 - 05/16/20 06:02 PM (3 years, 8 months ago) |
|
|
Really bored right now and having no AC, going crazy.
So I decided to turn Alan's little snippet of phylogeny data of Psilocybe Cubensis (from the type collection thread in Hunting) into a Biopython script to query NCBI Entrez for the data. This is about the most bioinformatics I've ever done, so it's all basically gibberish to me. I'm just fucking around. Maybe it could be useful somehow though.
Here's the script at present (fuck github the shroomery is my code host):
Code:
#!/usr/bin/env python3
# Modified from: # https://dmnfarrell.github.io/bioinformatics/assemblies-genbank-python # to search the NCBI Entrez core nucleotide database, using biopython
# Prereqs: python3.8 venv; pip3 install biopython # Usage: ./cubefam.py or python3 cubefam.py
# Alan Rockefeller's Cubensis Relative Identifiers, from: # https://www.shroomery.org/forums/showflat.php/Number/26647546#26647546
# "Accession.Version" : "Description" cube_relatives = { "MH614275.1" : "Psilocybe Cubensis Mexico Veracruz Xico", "NR_156572.1" : "Psilocybe chuxiongensis China", "MK214734.1" : "Psilocybe ovoideocystidiata USA Maryland", "MH050397.1" : "Psilocybe ovoideocystidiata USA California SF", "MG984054.1" : "Psilocybe subaeruginascens South Africa", "KX357890.1" : "Psilocybe thaiaerugineomaculans Thailand", "MF955122.1" : "Psilocybe azurescens Canada BC Vancouver" }
def get_nucleotide_summary(id): """Get esummary for an entriz id""" from Bio import Entrez esummary_handle = Entrez.esummary(db="nucleotide", id=id) esummary_record = Entrez.read(esummary_handle) return esummary_record
def get_nucleotide(term, acc): """ Get *nucleotides* for a given search term. Args: - term: search term, usually organism name - accn: the specific identifier we are looking for
Original script downloaded data, I just wanna grab the referenced sequences and do stuff with them in memory. """ from Bio import Entrez # provide your own mail here Entrez.email = "email@example.com" handle = Entrez.esearch(db="nucleotide", term=term, retmax='200') record = Entrez.read(handle) ids = record['IdList'] print (f'[!] Found {len(ids)} ids associated with "{term}"') for id in ids: summary = get_nucleotide_summary(id) title = summary[0]["Title"] if summary[0]["AccessionVersion"] == acc: print (f'[!] Found accession {acc}: "{title}"') # now we just want to find ITS sequence and print it out handle = Entrez.efetch(db="nucleotide", id=id, retmode="xml") records = Entrez.read(handle) genbankz = len(records) if genbankz > 1: print (f'[!] There are {genbankz} genbank records associated.') else: print (f'[!] There is 1 genbank record associated.') print (f'[!] Printing out the first DNA sequence we can find') for record in records: if record["GBSeq_moltype"] == "DNA": print (f'[!] A DNA Sequence was retrieved: {record["GBSeq_sequence"]}')
if __name__ == "__main__": for acc in cube_relatives: desc = cube_relatives[acc] # grab genus + species from the description. shroom = (desc.split(" ")[0] + " " + desc.split(" ")[1]).lower() print (f'--- Grabbing nucleotide data for "{desc}" --- ')
# only grab nucelotide for one species and record at a time # Biopython tutorial recommends we *DONT* do this, and we should run # up against throttling if we do, so this should be done as a batch # and then parsed, but that's later: get_nucleotide(shroom, acc)
print("\n")
Here's the output of a recent run:
Code:
--- Grabbing nucleotide data for "Psilocybe Cubensis Mexico Veracruz Xico" --- [!] Found 106 ids associated with "psilocybe cubensis" [!] Found accession MH614275.1: "Psilocybe cubensis voucher Mushroom Observer # 295160 internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene and internal transcribed spacer 2, complete sequenc e; and large subunit ribosomal RNA gene, partial sequence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: ggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatttccacctgtgcactttttgtagatcattgtttttggaagctggattgaagtcagagattactctctgatgaattgaaggctttctcaatgatggtctacgttttcatatactccaatgaat gtaacagaatgtatctatatggccttgtgcctataaaacaatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaatctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgtttgagtgtcattaaattctca accttaccagcttttgttagcttgtgtaatggcttggacttgggggtttattttgccggcttcttaccaagtcagctccccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccgtggatgtctactattaatgggttgaagctgcttcaaaccgtctgtttactcagacaattaatgacaat ttgacctcaaatcaggtaggactacccgctgaacttaagcatatca --- Grabbing nucleotide data for "Psilocybe chuxiongensis China" --- [!] Found 16 ids associated with "psilocybe chuxiongensis" [!] Found accession NR_156572.1: "Psilocybe chuxiongensis IFRD 414-011 ITS region; from TYPE material" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: caaggtttccgtaggtgaacctgcggaaggatcattattgaataactttggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatctccacctgtgcactttttgtagatcatcgttttggaagctggattgaagtcggagaggtctctctctgatga attgaaagctttctcaatggcggtctacgttttcatatactccaatgaatgtaacagaatgtatctatatggccttgtgcctataaaacaatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaatctttgaacgcaccttgcg ctccttggtattccgaggagcatgcctgtttgagtgtcattaaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggttcttttgccggcttcttacaaagccagctccccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccgtggatgtctgctataatgg gttgaagctgcttctaaccgtctgttcagtcagacaattaatgacaatttgacctcaaatcaggtaggactacccgctgaacttaagca --- Grabbing nucleotide data for "Psilocybe ovoideocystidiata USA Maryland" --- [!] Found 8 ids associated with "psilocybe ovoideocystidiata" [!] Found accession MK214734.1: "Psilocybe ovoideocystidiata voucher MushroomObserver.org/238931 small subunit ribosomal RNA gene, partial sequence; internal transcribed spacer 1 and 5.8S ribosomal RNA gene, com plete sequence; and internal transcribed spacer 2, partial sequence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: agaaattcttggtcacttagaggaagtaaaagtcgtaacaaggtttccgtaggtgaacctgcggaaggatcattattgaataactttggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatctccacctgtgcacctttagtagacgtctttgttggaagctggataggagagaatgggtgctagtcactctttctcgagttgaaggctttctcaaggtcgctctatgttttcatataccccaagtatgtaacagaatgtatctatatggccttgtgcctataaaactatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaatt gcagaattcagtgaatcatcgaatctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgtttgagtgtcattaaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggttttttgccggcttctaacaaagtcagctccccttaaatgcattagccggctgcccgctgtggaccgtctatt ggtgtgataattatctacgccgtggatgtctgctatcaatgggtttttaaagctgcttctaaccgtctgttcattcggacaatacaatgacaatttgacctccaaaatc
--- Grabbing nucleotide data for "Psilocybe ovoideocystidiata USA California SF" --- [!] Found 8 ids associated with "psilocybe ovoideocystidiata" [!] Found accession MH050397.1: "Psilocybe ovoideocystidiata voucher MushroomObserver.org/91440 internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene and internal transcribed spacer 2, complet e sequence; and large subunit ribosomal RNA gene, partial sequence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: ctttggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatctccacctgtgcaccttttgtagacgtctttgttggaagctggataggagagaatgggtgctagtcactctttctcgagttgaaggctttctcaaggtcgctctatgttttcatatac cccaagtatgtaacagaatgtatctatatggccttgtgcctataaaactatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaatctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgtttgagtgtcatt aaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggttttttgccggcttctaacaaagtcagctccccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccgtggatgtctgctatcaatgggtttttaaagctgcttctaaccgtctgttcattcggacaat acaatgacaatttgacctcaaatcaggtaggactacccgctgaacttaagcatatcaataagcggaggaaaagaaactaacaaggattcccctagtaactgcgagtgaagcgggaaaagctcaaatttaaaatctggcggtctccggccgtccgagttgtaatctagagaagtgttatccgcgctggaccgtgtac
--- Grabbing nucleotide data for "Psilocybe subaeruginascens South Africa" --- [!] Found 6 ids associated with "psilocybe subaeruginascens" [!] Found accession MG984054.1: "Psilocybe subaeruginascens voucher MushroomObserver.org/235235 internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene and internal transcribed spacer 2, complet e sequence; and large subunit ribosomal RNA gene, partial sequence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: ctttggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatctccacctgtgcaccttttgtagacgtctttgttggaagctgaataggagagaaagggtgctagtccactttttctcgagttgaaggctttctcaaggtcgctctatgttttcatata ccccaagaatgtaacagaatgtatctatatggccttgtgcctataaaactatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaatctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgtttgagtgtcat taaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggttttttgccggcttctaacgaggtcagctycccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccgtggatgtctgctatcaatgggttttgaagctgcttctaaccgtccttagggacaatacaa tgacaatttgacctcaaatcaggtaggactacccgctgaacttaagcatatc
--- Grabbing nucleotide data for "Psilocybe thaiaerugineomaculans Thailand" --- [!] Found 4 ids associated with "psilocybe thaiaerugineomaculans" [!] Found accession KX357890.1: "Psilocybe thaiaerugineomaculans voucher MFLU:10-0851 internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene, complete sequence; and internal transcribed spacer 2, partial sequence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: ttgaataactttggcgtggttgtagctggccctctcgggggcatgtgctcgcccgtcatctttatatctccacctgtgcaccttttgtagacgtctttgttggaagctggataggagaggaaaagggtgctagtcactttttttctcgagttgaaggctttctcaaggtcgctctat gttttcatataccccaagaatgtaacagaatgtatctatatggccttgtgcctataaaactatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaatctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgt ttgagtgtcattaaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggttttttttgccggcttctaacgaagtcagctccccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccgtggatgtctgctatcaatgggttttcaagctgcttctaaccgtcctt ttggacaatacaatgacaa
--- Grabbing nucleotide data for "Psilocybe azurescens Canada BC Vancouver" --- [!] Found 11 ids associated with "psilocybe azurescens" [!] Found accession MF955122.1: "Psilocybe azurescens voucher UBC F-32225 internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene, complete sequence; and internal transcribed spacer 2, partial s equence" [!] There is 1 genbank record associated. [!] Printing out the first DNA sequence we can find [!] A DNA Sequence was retrieved: tttcggcgctctacgttttcatataccccaaagaatgtaacagaatgtatcttatggctttatgcctataaactatatacaactttcagcaacggatctcttggctctcgcatcgatgaagaacgcagcgaaatgcgataagtaatgtgaattgcagaattcagtgaatcatcgaat ctttgaacgcaccttgcgctccttggtattccgaggagcatgcctgtttgagtgtcattaaattctcaaccttaccagcttttgttagcttgtgtaatggcttggacttgggggtmttttgccggcttctctygagatgtcagctccccttaaatgcattagccggctgcccgctgtggaccgtctattggtgtgataattatctacgccg tggacgtctgctctcaatgggttgaagctgcttctaaccgyccgttcattcggacagcacataatgacaa
I dont' actually recommend running this script as it's written (more than a couple times) - it needs to be rewritten to respect NCBIs API and throttling requirements a bit more (should run as a batch).
However, I think the next thing I will do is add mushroomobserver links for those with observations and then set up HTML output as well, and/or something I can copy/paste into this BBedit thing.
|
PTreeDish



Registered: 04/22/18
Posts: 353
Last seen: 4 months, 3 days
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: 24sevenZed] 1
#26675732 - 05/16/20 09:12 PM (3 years, 8 months ago) |
|
|
Neat. If you moved this to an iPython notebook and linked to a repo on github, other folks could run the code automatically and you could present a more human-readable output of the data.
Maybe you could also allow the user to input a search param, turn the sequence into a visualization and/or generate and show the phylogenic tree for the sp. Those would also make the script more useful since right now this doesn't really offer much more than what one could do manually directly on the site.
Edited by PTreeDish (05/20/20 12:40 AM)
|
Alan Rockefeller
Mycologist

Registered: 03/10/07
Posts: 48,311
Last seen: 1 day, 18 hours
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: PTreeDish]
#26680233 - 05/19/20 03:26 AM (3 years, 8 months ago) |
|
|
A script that downloaded sequences and made a phylogenetic tree would be really cool. I wish there was a way to make a tree, then allow the users to click on nodes they wanted to delete, and re-make it. That would save a ton of time.
|
PinkStormtrooper
Jet-Puffed


Registered: 04/11/20
Posts: 218
Loc: 10001110101
Last seen: 3 years, 4 months
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: Alan Rockefeller]
#26680508 - 05/19/20 07:15 AM (3 years, 8 months ago) |
|
|
""Here's the script at present (fuck github the shroomery is my code host):""
aahahaahahahaahahaahahahahaahahahahahahahahahahahaahahahahahaaha
@Alan - you gave me an idea. we need a GEDCOM (genealogy) type app but for this.
-------------------- "say, you got a little astroglide on your moustache"
|
24sevenZed
~Z3D3Z~



Registered: 11/05/16
Posts: 90
Loc: CYBERSPACE
Last seen: 3 years, 8 months
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: PinkStormtrooper]
#26681204 - 05/19/20 01:49 PM (3 years, 8 months ago) |
|
|
Quote:
PTreeDish said: Neat. If you moved this to an iPython notebook and linked to a repo on github
Thanks for the suggestions. I think this is a good idea and will hack around with it when I have some time. Part of the reason for writing anything like this in Python IMO is the sheer amount of supporting libraries and tools like Jupyter and free notebook hosting for this sort of thing that's out there nowadays. That's why I called out biopython in the first place - it let me play around with this sort of data and NCBI without knowing much more than a little Python.
Quote:
Alan Rockefeller said: A script that downloaded sequences and made a phylogenetic tree would be really cool.
I definitely think this would be possible with the aforementioned tools. I'm curious how one could code such an interactive tool up for use in Jupyter and how hard it would be or how long it would take. I'm sure someone has done it already or something close enough, so I'll have to go look.
Part of the reason I started looking at biopython and bioinformatics recently was a desire to understand this phylogenetic information that you've posted here and learn about how its applied in Mycology (and now thanks to COVID, Virology as well. Viruses are fucking crazy and mycoviruses of course, are also interesting).
Quote:
PinkStormtrooper said: ""Here's the script at present (fuck github the shroomery is my code host):""
aahahaahahahaahahaahahahahaahahahahahahahahahahahaahahahahahaaha
seriously though it would be cool to have a sort of shroomery myco-repo as this knowledge continues to expand. More and more data keeps coming online for Psilocybe species, and not just genomic data but other omic stuff as well such as all the associated proteins and so forth. The Cubensis transcriptome is out there now, or at least the shotgun sequence is probably waiting for annotation. There's a ton of stuff that could be done in Advanced Mycology just completely online now.
|
PTreeDish



Registered: 04/22/18
Posts: 353
Last seen: 4 months, 3 days
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: Alan Rockefeller]
#26682418 - 05/20/20 01:48 AM (3 years, 8 months ago) |
|
|
This might be what you're after: https://github.com/evolbioinfo/gotree
I think the two things missing is that it takes rexml as the input so you'd need to use the genbank api to return the fasta from a query, pipe it to rexml cli then feed the output to gotree which accepts many args for collapsing and modifying nodes.
The part missing is the interactive UI that shows the generated tree and reissues the generate command with whatever UI action was taken (like collapse node with id n). There is an output option type of html but I doubt it lets you modify the tree from the page.
I do kind of like the UI for tree viewing/manip on https://itol.embl.de/itol.cgi but they don't have an API that would make your script feasible.
Anyway, is that kind of along the lines of what you were thinking?
Edited by PTreeDish (05/20/20 01:51 AM)
|
Alan Rockefeller
Mycologist

Registered: 03/10/07
Posts: 48,311
Last seen: 1 day, 18 hours
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: PTreeDish]
#26682700 - 05/20/20 06:59 AM (3 years, 8 months ago) |
|
|
Yes, those are pretty cool!
|
24sevenZed
~Z3D3Z~



Registered: 11/05/16
Posts: 90
Loc: CYBERSPACE
Last seen: 3 years, 8 months
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: Alan Rockefeller]
#26683038 - 05/20/20 10:42 AM (3 years, 8 months ago) |
|
|
Yes I think that's going in the right direction - gotree looks awesome (and is probably nice and fast).
However I was hoping to use something that would work in-line out of the box in Jupyter. Biopython uses matplotlib I guess, to draw some of the trees: https://biopython.org/wiki/Phylo - but I dont know how you would make that interactive either. It looks like you can possibly do that with the Rich Output capabilities you linked to earlier?
OTOH, one could just write up a Web application that calls out to something like Gotree or a biopython script and skip Jupyter/iPython. I would start by evaluating which javascript libraries can draw proper graphs and do interactive stuff with them (and you could probably just integrate that into Jupyter as well if you wanted). Eg: https://js.cytoscape.org/
The ITOL looks really cool. I wish it was open source. It does appear someone wrote an unofficial python client for it: https://github.com/albertyw/itolapi, which appears to be relatively recently maintained.
|
24sevenZed
~Z3D3Z~



Registered: 11/05/16
Posts: 90
Loc: CYBERSPACE
Last seen: 3 years, 8 months
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: 24sevenZed]
#26683111 - 05/20/20 11:21 AM (3 years, 8 months ago) |
|
|
|
PTreeDish



Registered: 04/22/18
Posts: 353
Last seen: 4 months, 3 days
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: 24sevenZed]
#26684628 - 05/21/20 12:13 AM (3 years, 8 months ago) |
|
|
Def go web route - the interactivity is not really suited for jupyter. I'd look at how gotree generates the html output of trees. Maybe they have a rudimentary svg rendering you could borrow and/or iterate on.
|
AndyHinton


Registered: 12/05/16
Posts: 434
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: PTreeDish]
#26688508 - 05/22/20 03:40 PM (3 years, 8 months ago) |
|
|
--------------------
|
24sevenZed
~Z3D3Z~



Registered: 11/05/16
Posts: 90
Loc: CYBERSPACE
Last seen: 3 years, 8 months
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: AndyHinton]
#26697814 - 05/26/20 07:27 PM (3 years, 8 months ago) |
|
|
So I created this mostly empty project to start housing this code as I play around with all the ideas in this thread and whatnot:
https://gitlab.com/247z/fffungi.py
Right now the above script from this post is sitting on the dev branch (https://gitlab.com/247z/fffungi.py/-/blob/dev/cubefam.py). If time permits and this actually becomes more than the one file, I can start adding other people to the project who are interesting in messing <whatever it actually becomes> too.
Also, AndyHinton - I need to thank you for https://biotorrents.de - what an awesome idea. I just got a chance to create an account and check it out. If in the course of playing with this stuff I’m generating any large data sets, I will try and get them on there.
I think we should try to get the Psilocybe Serbica genome on there as well as the cubensis transcriptome (I’ll add NCBI links in here later). There should definitely be torrents for all the public psilocybe datasets.
|
AndyHinton


Registered: 12/05/16
Posts: 434
|
Re: biopython shenanigans : P. Cubensis data from NCBI [Re: 24sevenZed]
#26701379 - 05/28/20 09:49 AM (3 years, 8 months ago) |
|
|
Thanks for your kind words, 24sevenZed.
Both the P. serbica genome and the P. cubensis transcriptome are already uploaded. I was going to post this thread's stuff in the bioinformatics board, but it's better if you do.
I've been away the last couple weeks (springtime waits for no man) but there are more updates coming, a bit more consolidation before I really start on the specific features the site needs.
Please check the Slack channel for eva's great suggestions. Damn I feel bad for leaving him hanging so long...
--------------------
|
|