Dmg Pca Phase 1
Background
Priscilla Ashvane´s relentless pursuit of power has ultimately led her to the Eternal Palace and into a dark pact with Queen Azshara. But what Ashvane receives in return for her loyalty may be more than she bargained for.
Apr 04, 2013 PR of 1 H-NMR spectra of tissue specimens. A, scores plot of PCA model based on 26 cases normal mucosae (black triangles) and 89 cases esophageal cancers (red dots).B, scores plot of OPLS-DA model processing based on same samples.C, statistical validation of the corresponding PLS-DA model by permutation analysis (200 times).R2 is the explained variance, and Q2 is the predictive ability.
Overview - Fast Tactics
- This fight has 2 phases. In phase 1, you break the shield on the boss. In phase 2, the shield is broken.
- In phase 1, you will dodge corals and abilities, and in phase 2 you will clear the corals and dodge abilities.
- During the entire fight, you will have
- Pools under the tank every time boss slams, tank near wall and move clockwise.
- 3 players getting a bubble, stack up and AoE destroy bubbles.
- Swirls under random players that does dmg and toss you up, dodge them. - Phase 1 - corals that spawns on the ground, this does dmg, also orbs spawns from it and moves to boss. When it touches boss, the boss gets more armor. Dodge corals and move boss away from orbs. Soak orbs but beware since they do raid dmg every time you touch an orb.
- Phase 2 – Three players marked, after 8 sec, blasts high Arcane dmg to all players in a line. This destroys the corals. Position yourself so that corals break.
Positioning
Tanks
Tank the boss directly on the side (wall).
Move the boss clockwise every time you dodge abilities.
They can be searched, added and viewed directly within Word. Through a subscription of Office 365 and the integration with SkyDrive you can access and edit your files from any computer via a browser.Among the new features are a 'Read Mode' in Word 2013 which removes toolbars and lets you swipe and tab through a document like in an 'E-Reader'. Download microsoft office 2013 for mac. Excel 2013 has new easy ways of working with formulas and charts in spreadsheets. Videos are better supported.
Melee
Stack together on the side of the boss (not behind)
Ranged
Stack up together, have a marked player and follow/stack on that player.
Abilities and Tactics
PHASE 1 – HARDENED CARAPACE
Hardened Carapace
- Boss got a shield that absorbs 40% of her total health.
- Every time the boss regrows her shield, it grows back 150% stronger.
Tactics: Hardened Carapace
- This shield is phase 1. As soon as you break this shield, phase 2 starts (which is 75% of her health)
Coral Growth
- Boss summons corals on the platform.
- Players within 10 yards take damage and a knockback.
Tactics - Coral Growth
- You will get a lot of the corals. Always check were they are and get a visual map overview.
- Move boss away from Corals.
Rippling Wave
AF Air Filter Mine: 174 damage.For defending against zombies, CTs will kill any non-feral non-fat that steps on them and can 'sometimes' kill nearby zombies as well. PPs will kill anything including some ferals and can regularly kill nearby zombies. Nothing can survive the double-blast, not even a 250-wellness, 600-military-armor-wearing player- but it's worth noting that there is a tiny delay between the mine going off and the gas barrel exploding, meaning that sprinting players/zombies might not get the full effect. AFs will kill anything except high level players, and will absolutely kill everything within 1 meter of the blast.If you place a mine on a gas barrel (even CTs), it will chain-explode. Traps dmg friendlies again.
- The coral will shoot out orbs that travels towards the boss.
- These orbs increase the shield on the boss up to 10% and do dmg on all players.
- If a player touches an orb, it bursts and does dmg to all players. The player that touched the orb will also get a stacking debuff that does dmg every 3 sec for 15 sec.
Tactics: Rippling Wave
- Tank
Kite the boss away from the orbs. - Soak
Soak the orbs but no all at once, since entire raid takes dmg from the orb.
Prio players that can take heavy dmg over time, the orb puts stacking debuff. - Healers
Heavy dmg will occur since the orbs must be soaked. Beware.
Briny Bubble
- Boss puts a water bubble on 3 players (main tank and 2 random).
- These 3 players become incapacitated and take dmg every 1.5 sec until the bubble is destroyed.
Tactics: Briny Bubble
- Bubbled players stack up together, on the tank with the bubble.
- AoE nuke and destroy the bubbles.
- Tank swap when main tank gets bubble.
Upsurge
- Blast of water from Random players position.
- If players do not run out of it before it blasts, they take high dmg and tossed up into the air.
Tactics: Upsurge
- Dodge the swirls in the ground.
Barnacle Bash - Tanks
- Boss slams the main tank with physical dmg.
- This leaves a 35 sec debuff that increase dmg taken from the next slam by 150%.
- Every time a slam occurs, a sharp coral erupts from the ground that does dmg every 1.5 sec to players standing on it.
Tactics: Barnacle Bash - Tanks
- Tanks
- The tank swap is synchronized with the Briny Bubble. Tank swap every time main tank becomes a bubble.
- Move boss clockwise every time boss does a slam and avoid the circle of sharp corals.
- Melee
Attack boss from the side, so you don’t take debuff dmg from the sharp corals.
PHASE 2 – EXPOSED AZERITE
Exposed Azerite
Boss shoots energy bolts at random players periodically.
Arcing Azerite
- Boss puts a debuff on 3 random players.
- After 8 sec, the debuff expires and blasts high Arcane dmg to all players in a line between the marked players.
- All corals in the line between the marked players will be destroyed.
Tactics - Arcing Azerite
- This is how you clear the platform from corals.
- Marked players must position themselves to destroy the corals.
- Unmarked players must beware and dodge them.
Briny Bubble
Same as phase 1
Upsurge
Same as phase 1
Barnacle Bash
Same as phase 1
When to Bloodlust/Timewarp/Heroism
After you break the shield on boss
Please enable JavaScript to view the comments powered by Disqus.Other posts
Principal components analysis (PCA) is a mainstay of population genetics, providing a model-free method for exploring patterns of relatedness within a collection of individuals. PCA was introduced as a tool for genetic genetic analysis by Patterson, Price & Reich (2006). Subsequently Gil McVean (2009) provided an analytical framework for understanding PCA in terms of genetic ancestry. However, although PCA is widely used and the analytical details are worked out, there are a number of practical issues that come up when trying to run PCA on large SNP datasets from next-generation sequencing experiments. For example, small changes in how you prepare the input data can make a big difference to the outputs. The Ag1000G phase 1 data provide a concrete illustration of some of these issues, so I thought I’d try to bring together some experiences here.
Also, while PCA is fairly quick to run on smaller datasets, it can become slow and memory-intensive with larger data. A few months ago I discovered that scikit-learn includes a randomized SVD implementation, which is a potentially faster and more scalable method for approximating the top N components than using a conventional singular value decomposition. To evaluate randomized PCA I implemented some functions in scikit-allel which provide a convenience layer between underlying SVD implementations in NumPy and scikit-learn and the typical data structures I used to store genotype data. I know others have also started working with randomized PCA for genotype data (Galinsky et al. 2015) so I thought it would be interesting to apply both conventional and randomized SVD implementations to a non-human dataset and report some performance data.
Setup
I have a copy of the Ag1000G phase 1 AR3 data release on a local drive. The SNP genotype data is available in an HDF5 file.
Let’s work with chromosome arm 3L.
Setup the genotype data.
<GenotypeChunkedArray shape=(9643193, 765, 2) dtype=int8 chunks=(6553, 10, 2) nbytes=13.7G cbytes=548.0M cratio=25.7 compression=gzip compression_opts=3 values=h5py._hl.dataset.Dataset>0 | 1 | 2 | 3 | 4 | .. | 760 | 761 | 762 | 763 | 764 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
.. | .. | |||||||||||
9643190 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
9643191 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
9643192 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
Count alleles at each variant. This takes a minute or so.
<AlleleCountsArray shape=(9643193, 4) dtype=int32>0 | 1 | 2 | 3 | ||
---|---|---|---|---|---|
0 | 1527 | 3 | 0 | 0 | |
1 | 1529 | 1 | 0 | 0 | |
2 | 1528 | 2 | 0 | 0 | |
.. | .. | ||||
9643190 | 1512 | 16 | 0 | 0 | |
9643191 | 1527 | 1 | 0 | 0 | |
9643192 | 1507 | 18 | 1 | 0 |
Before going any further, I’m going to remove singletons and multiallelic SNPs. Singletons are not informative for PCA, and the analysis is simpler if we restrict to biallelic SNPs.
For interest, how many multiallelic SNPs are there?
How many biallelic singletons?
Apply the filtering.
<GenotypeChunkedArray shape=(4825329, 765, 2) dtype=int8 chunks=(590, 765, 2) nbytes=6.9G cbytes=442.8M cratio=15.9 compression=blosc compression_opts={'shuffle': 1, 'cname': 'lz4', 'clevel': 5} values=zarr.core.Array>0 | 1 | 2 | 3 | 4 | .. | 760 | 761 | 762 | 763 | 764 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
.. | .. | |||||||||||
4825326 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
4825327 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | |
4825328 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | .. | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
Finally, transform the genotype data into a 2-dimensional matrix where each cell has the number of non-reference alleles per call. This is what we’ll use as the input to PCA.
Note that we are still working with reasonably large amounts of data here, and so we are using chunked compressed arrays for storage. The gn
variable above uses a Zarr array to store the data, one of several chunked storage containers that can be used with scikit-allel.
Removing correlated features (LD pruning)
As I understand it, PCA works best when the features you provide as input are independent from each other. Here each SNP is a feature, however, because DNA is transmitted from one generation to the next with some recombination between parents, genotypes at nearby SNPs tend to be correlated, with the correlation (linkage disequlibrium) decaying as you increase the separation between SNPs.
We can get a sense of that correlation structure by visualising pairwise linkage disequilibrium in the first 1000 SNPs.
The darker regions in the plot above indicate pairs of SNPs where genotypes are correlated.
Before I deal with this correlation directly, I’m going to thin down the data a bit. There are 4,825,329 SNPs left after the initial filtering steps above, and analysing this many features would be slow. Here we are more concerned with running an exploratory analysis, so I’m going to randomly choose a subset of these SNPs to work with. This should still reveal the main signals in the data, while making runtime faster.
By randomly downsampling SNPs, this should have dealt with much of the correlation between nearby features. Let’s take a look at the first 1000.
You can see that much of the correlation is gone. However, depending how dusty your screen is, you may be able to see some speckling, indicating that there are still some correlated SNPs in the dataset.
To remove this remaining correlation, I’m going to explicitly locate SNPs that are not correlated with each other, using the locate_unlinked
function from scikit-allel. This is known as LD pruning, and works by sliding a window along the data, computing pairwise LD between all SNPs within each window, then removing one SNP from each correlated pair.
Conventionally, LD pruning is run just once, however I’m going to run several iterations. In some cases this may make a difference to the results, in others it may not, probably depending on how much long-range LD is present in your samples. Running multiple iterations does slow things down a bit, but it’s interesting to demonstrate and see what the effect is.
5 iterations is probably more than necessary for this dataset, as you can see not many SNPs are removed after the first few iterations.
I’ve used a sliding window size of 500 SNPs here, which is larger than others typically use. Out of interest, how many SNPs would be removed if we used a smaller window and just one iteration?
So with this dataset, using a larger window and multiple iterations finds and removes a lot more correlated SNPs. This is probably related to the fact that there are a lot of rare variants in the data, and so a larger window is required to find variants in linkage.
Let’s take a look at how much LD is left after LD pruning.
The data are relatively small now after downsampling and LD-pruning, so we can bring the data out of chunked storage and into memory uncompressed, which is necessary for PCA.
PCA via conventional SVD
Dmg Pca Phase 1 Download
Let’s run a conventional PCA analysis of the LD-pruned genotype data.
To help visualise the results, I need to pull in some metadata about which population each individual mosquito belongs to.
ox_code | src_code | sra_sample_accession | population | country | region | contributor | contact | year | m_s | sex | n_sequences | mean_coverage | latitude | longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | |||||||||||||||
0 | AB0085-C | BF2-4 | ERS223996 | BFS | Burkina Faso | Pala | Austin Burt | Sam O'Loughlin | 2012 | S | F | 89905852 | 28.01 | 11.150 | -4.235 |
1 | AB0087-C | BF3-3 | ERS224013 | BFM | Burkina Faso | Bana | Austin Burt | Sam O'Loughlin | 2012 | M | F | 116706234 | 36.76 | 11.233 | -4.472 |
2 | AB0088-C | BF3-5 | ERS223991 | BFM | Burkina Faso | Bana | Austin Burt | Sam O'Loughlin | 2012 | M | F | 112090460 | 23.30 | 11.233 | -4.472 |
3 | AB0089-C | BF3-8 | ERS224031 | BFM | Burkina Faso | Bana | Austin Burt | Sam O'Loughlin | 2012 | M | F | 145350454 | 41.36 | 11.233 | -4.472 |
4 | AB0090-C | BF3-10 | ERS223936 | BFM | Burkina Faso | Bana | Austin Burt | Sam O'Loughlin | 2012 | M | F | 105012254 | 34.64 | 11.233 | -4.472 |
Looking at the left-hand plot of PC1 versus PC2, there is a clear separation of individuals into 6 different clusters. This indicates there are at least 6 genetically distinct populations represented by the mosquitoes we’ve sequenced. The plot of PC3 vs PC4 gives us additional evidence that certain populations (GAS and KES) are genetically distinct from each other and the rest, but doesn’t reveal any new clusters.
Effect of LD pruning
What would happen if we ran PCA on the data without removing correlated SNPs?
Although all of the same population sub-divisions are visible in the first four components, they are resolved in a very different way. The first two components are now driven strongly by two populations, Angola (AOM) and Kenya (KES), and further population structure is not clearly resolved until PC3 and PC4.
Dmg Pca Phase 1 3
It is interesting to note that the Kenyan and Angolan populations are the two populations with the lowest heterozygosity. In particular, almost all Kenyan samples have very long runs of homozygosity, suggesting a recent population crash. I would hazard a guess that, in particular for Kenya, there is long-range LD which is affecting the PCA. When we used the aggressively LD-pruned data in Figure 4 above, this effect is reduced.
Effect of scaling
Patterson et al. (2006) proposed scaling the data to unit variance at each SNP, assuming that the alleles are approximately binomially distributed. McVean (2009) remarks that scaling the data in this way should have little effect, although it will upweight rare variants (i.e., SNPs where the minor allele is at low frequency in the dataset). Let’s return to using the LD pruned data, and see what happens if we don’t use Patterson’s scaling method but instead just centre the data.
Here again the same clusters are visible but are resolved in a different way. Also, note more of the total variance is explained by the first four components than when using the Patterson scaler. As McVean (2009) suggests, I would guess that these effects are both due to the weighting of rare variants. When rare variants are upweighted, this resolves more clearly any subtle population structure in the data. However, there a lot of rare variants in this dataset, and so the total amount of variance explained by the first few components goes down.
Effect of unequal sample sizes
McVean (2009) provides a very elegant demonstration of what can happen if different populations are not equally represented in your dataset. If there are many more samples from one particular population, this has the effect of warping the principal components.
In Ag1000G phase one there are a lot more samples from Cameroon (CMS) than any of the other locations.
What would happen if we randomly pick a subset of CMS samples, to achieve a more even representation?
Now the results are similar to the original PCA we plotted in Figure 4, however PC2 appears to be more balanced. So sample size clearly does matter. However, there is a chicken-and-egg problem here. If you are using PCA to discover population structure in some collection of individuals, you won’t know a priori if any particular population is overrepresented. Perhaps in that situation, an initial round of PCA to discover population structure can be followed up with a second round, downsampling any populations within which you observe no differentiation.
Randomized PCA
Randomized PCA is an alternative to conventional PCA. I don’t claim to understand the details, but apparently it uses an approximation to estimate the top principal components only, rather than evaluating all principal components as in a conventional SVD. So it should be faster and use less memory.
Let’s run a randomized PCA on the Ag1000G data and compare the results with the conventional PCA.
For the first four components at least, the results are indistinguishable from the conventional PCA.
Let’s compare performance.
What about memory usage?
So the randomized PCA is faster, scales better with more samples, and uses around half of the memory required by conventional SVD.
Conclusions
- LD pruning makes a difference. For NGS data, a larger window size and/or multiple rounds of pruning may be required to deal with regions of long-range LD. LD pruning may also impact different populations in different ways, if populations have different levels of LD.
- Scaling input data to unit variance using the method of Patterson et al. (2006) makes a small but noticeable difference, increasing the ability to resolve distinct populations within higher principal components.
- Unequal sample sizes warps the principal components, as predicted by McVean (2009).
- Randomized PCA produces results that are almost indistinguishable from conventional PCA, while running faster and using less memory. However, preparing the data (LD pruning) can also take a long time, so it would be good to find a way to optimise that step too.
Further reading
- Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2(12), 2074–2093.
- McVean, G. (2009). A genealogical interpretation of principal components analysis. PLoS Genetics, 5(10), e1000686.
- Galinsky, K. J., Bhatia, G., Loh, P.-R., Georgiev, S., Mukherjee, S., Patterson, N. J., & Price, A. L. (2015). Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia. bioRxiv.
Post-script: Randomized PCA and lower components
When I originally wrote this post, I only looked at the first four components. These account for most of the variance and so capture the major signals of population subdivision.
For the first four components, conventional and randomized PCA are basically the same. However, recently I looked into the lower components, where there are some interesting signals of population structure, but less variance is captured. For these lower components, results from conventional and randomized PCA are not so similar.
So an important caveat when using randomized PCA is that lower components may not be resolved very well, compared with conventional PCA.
Phase 1 Research Studies
Post-script: OpenBLAS
Dmg Pca Phase 1 2
The original version of this post was run with NumPy built without any linear algebra optimisations. Following a comment from Andreas Noack I have re-run this notebook with NumPy built with OpenBLAS. This significantly speeds up the runtime for both conventional and randomized PCA. With the data size used in this notebook, the bottleneck is now more data preparation (LD pruning) than running the PCA itself.