J Han / N Uberoi (@1.57) vs Y Zhang / Y Zhao C (@2.25)
10-09-2019

Our Prediction:

J Han / N Uberoi will win
  • Home
  • Tennis
  • J Han / N Uberoi vs Y Zhang / Y Zhao C

J Han / N Uberoi – Y Zhang / Y Zhao C Match Prediction | 10-09-2019 01:00

b The total length of fully unaligned sequences (Mb) obtained by using lower identity (8090%) to remove redundant sequences. a Length distribution of fully unaligned sequences. Characterization of sequences fully unaligned to GRCh38 primary assembly sequences in 185 deep sequencing Han Chinese genomes. c The sequence count and sequence size when aligning the sequences to GRCh38 primary assemblysequences with lower sequence identity (8090%). d Simulation of the total fully unaligned sequences using different numbers of individuals. f Validation of fully unaligned sequences by aligning to other available human sequences (90% identity). The RepeatMasker masked result of GRCh38 was downloaded from http://www.repeatmasker.org/species/hg.html. e The percentage of repeat elements resulted from RepeatMasker, hs38d1 is 5.8Mb novel sequences from SGDP, and GRCh38 is the primary assembly sequences of the human reference genome GRCh38.

In addition, we used two independent cohorts to show the power of HUPAN for pan-genome analysis. This approach also could be extended to the study of other genomic variations, such as copy number variations and other structural variations. For example, the misassembled contigs could be further analyzed to call these large structural variations, which were less accessible by reference-based variation calling tools. In this manuscript, we considered the non-reference genes. All individuals were sampled from Han Chinese population, and this analysis could be extended to other populations to capture the global genetic variations and also various tumors to explore the dynamic variations of cancer genomes.

The pellets were resuspended in lysis buffer (8 Murea, 30 mM HEPES, 0.5% SDS, 1 mM PMSF, 2 mM EDTA, and 10 mM DTT). The solution was dispersed by sonication for 5 min (power 180 W, pulse 2 s on and 3 s off), then centrifuged at 20,000 g for 30 min. Membrane proteins were extracted as previously described by Li [21], with minor modifications. The supernatant was collected as a control and used to assess the purity of the extracted membrane proteins by Western blotting. The supernatant was collected and ultracentrifuged at 100,000 g for 1 hour at 4C to purify cell membrane. The supernatant was collected and proteins assayed by the Bradford method. Purity was validated by Western blotting using a plasma membrane marker enzyme (Na+/K+ATPase) and a mitochondrial marker (prohibitin) [21]. The homogenate was centrifuged at 10,000 g at 4C for 15 min. Briefly, microdissected tissue was minced on ice and manually homogenized with a glass homogenizer containing precool homogenization buffer (200 mM mannitol, 70 mM sucrose, 10 mM Tris-HCl, pH 7.4, 1 mM EDTA, 0.5 mM EGTA, 1 mM PMSF, 0.2 mM Na3VO4, and 1 mM NaF).

After removing sequence contaminations from microorganisms and non-primate eukaryotes, we identified 28,622 fully unaligned sequences, with a total length of 30.72Mb and 8320 partially unaligned sequences, with a total length of 46.63Mb (Additional file 1: Figure S6). Majority of the partially unaligned sequences were classified into human and other primates (Additional file 1: Figure S5b), indicating these sequences are indeed from human genomes. After the non-reference sequenceswere merged, they were clustered to remove the redundant sequences across individuals. More than 20Mb of the 52.90Mb sequences were classified into microorganisms (Additional file 1: Figure S5a). We obtained 52.90Mb fully unaligned sequences and 46.76Mb partially unaligned sequences.

Additional information

This result is consistent with previous studies [3, 9, 26] (Fig. 2b). In addition, the GC content (%) of fully unaligned sequences was slightly higher than that of partially unaligned sequences. Obvious stratification was observed in the fully unaligned sequences before removing contamination sequences (Additional file 1: Figure S3), which were mainly from the bacterium Helicobacter pylori, one majority infectious agent associated with gastric diseases in several individuals (Additional file 1: Figure S4). Comparing with EUPAN, this new strategy could severely reduce both CPU time and memory consumption but with little loss in precision (Table1 and Additional file 1: Figure S2). After discarding the potential contamination, ~5Mb fully unaligned sequences and ~6Mb partially unaligned sequences for each individual were obtained (Fig.2a). In HUPAN, we proposed a hierarchical strategy to extract the non-reference sequences (see the Methods section).

The explosive growth of human whole-genome sequencing data brings significant challenges and tremendous opportunities to study the pan-genome of a specific population [21]. Instead of using all reads, only the unmapped reads were extracted to conduct de novo assembly [8, 20]. See more details in Additionalfile1: Supplementary methods). We compared the assembled results using all reads and unmapped reads with simulated sequencing data, and suggested that pseudo de novo assembly method may underestimate the size of non-reference sequences and produce more misassembled sequences at the meantime (Additional file 1: Table S1). Several previous studies reported non-reference genome sequences using the approach of pseudo de novo assembly [4, 6, 8, 20]. Nevertheless, due to the large size of the human genome, EUPAN cannot be applied for human pan-genome analysis because of the huge memory size requirement of the de novo assembly step (more than 500Gb memory is needed to assemble a human genome from a 30-fold sequencing data. Recently, we reported a tool EUPAN [22] based on a map-to-pan strategy and applied it to more than 3000 rice genomes [13]. However, constructing the pan-genome sequences from hundreds of individual genomes is a huge challenge. If all reads were used, aligning hundreds of assembled genomes to the human reference genome to extract the non-reference sequences and distinguishing the non-human sequences contaminated in sampling, sequencing, and other procedures are other challenges that need to be addressed.

The patients signed informed consents. For immunohistochemical analysis, the formalin-fixed and paraffin-embedded 62 lung cancer specimens (41 lung adenocarcinoma, 21 lung squamous cell carcinoma) and 24 normal lung tissues from surgical resections were obtained from the Second Affiliated Hospital, Xian Jiaotong University and Shaanxi Cancer Hospital for this retrospective study. Lung adenocarcinoma and matched adjacent normal lung tissue samples were obtained from 10 patients who underwent surgery at the Second Affiliated Hospital of the Medical School of Xian Jiaotong University. All samples were immediately snap-frozen in liquid nitrogen and stored at -80C until analyzed. Adjacent normal tissue was obtained at least 5 cm away from the primary tumor. All patients with lung adenocarcinoma were confirmed by pathological diagnosis. The 10 pairs of lung cancer and matched normal lung tissue were used for comparative proteomic analysis and Western blotting. None of the patients had received chemotherapy or radiotherapy before surgery. This study was approved by the local ethnics committee.

Analyzing membrane proteomes may help us understand carcinogenic mechanisms and promote the discovery of new potential tumor biomarkers and therapeutic targets. Membrane proteins account for approximately 30% of the whole cell proteome and are known to be involved in cell proliferation, cell adhesion, and tumor cell invasion. They are also pivotal to the development, growth, angiogenesis, and metastasis of tumors [12-14]. The cell membrane is involved in many biological functions, including small molecules transport, cell-cell and cell-substrate recognition and interaction, and cell signaling transduction and communications [10,11].

This is particularly important when we focus on the non-reference sequences. DNA contamination from other organisms may lead to imprecise outcome and should be considered in any sequencing project [32]. The major source of non-human sequences was microorganisms, and majority of remaining sequences were labeled as human. In order to get high confidence non-reference sequences derived from human genome rather than contamination, we proposed a strict filtering step to drop potential contamination sequences as many as possible. We used a local alignment method to classify and exclude the sequences labeled as microorganisms or non-primate eukaryotes. There are several possible sources of contaminants, such as biological source and DNA present in reagents or instruments [33].

Discover the world's research

We then investigated the relationship of S100A14 expression with clinicopathological features and underlying molecular mechanisms in lung cancer patients. These profiles were subjected to quantitative proteomics, whereby S100A14 was found to be overexpressed in lung adenocarcinoma compared with normal lung tissue. In this study, we performed iTRAQ labeling followed by LC-MS/MS to identify differential protein expression profiles of cell membranes from pooled lung adenocarcinoma and matched normal lung tissue samples. S100A14 expression was further confirmed in clinical samples by Western blotting.

Summary of non-reference sequences for individual genomes. a The total length (Mb) and b the GC content (%) of unaligned contigs (including fully unaligned sequences and partially unaligned sequences) obtained for each individual after removing potential contamination. In b, the solid black line represents GC content of the primary sequence in GRCh38 (40.87%); the dotted lines represent GC content of novel sequences of YH genome [26] (red, 44.11%); 5.8Mb novel contigs from SGDP [9] (green, 43.43%) and novel sequences of NA18507 genome (orange, 42.87%).

The precipitates were resuspended in solution buffer (50% TEAB, 0.1% SDS). The extracted membrane proteins were reduced with 10 mM DTT and alkylated with 55 mM IAM. They were then precipitated by cold acetone, stored at -20C for 3 h, and concentrated by centrifuging at 20,000 g for 30 min. The lung cancer and normal lung tissue samples were labeled with iTRAQ117 and iTRAQ118, respectively. Protein digestion and iTRAQ labeling were performed according to the iTRAQ kit protocol (Applied Biosystems). Then 100 ug protein solutions were digested with 1 ug/ul trypsin solution at 37C overnight and labeled with iTRAQ tags.

We found that S100A14 expression was increased in lung adenocarcinoma and squamous cell carcinoma compared with that in normal lung tissue. These apparent discrepancies suggest that S100A14 plays different roles at different tumor and development stages, although the underlying mechanism remains unclear. Recently, some studies reported S100A14 expression in various cancers, but the results were inconsistent. The IHC data indicated that S100A14 may play a potential role in cell differentiation. Furthermore, our results showed that S100A14 was expressed at higher levels in well or moderately differentiated lung cancer than in poorly differentiated lung cancer, which was consistent with a previous report [40]. On the other hand, some data suggest that S100A14 is downregulated in kidney, colon, rectal, esophageal, and oral carcinoma [35,39]. Therefore, using IHC we further detected S100A14 expression in paraffin-embedded archival tissue specimens and evaluated the relationship between S100A14 and clinicopathological characteristics in patients with lung cancer. Although northern blot hybridization has shown that S100A14 mRNA expression is upregulated in lung tumors [35], S100A14 protein expression in lung cancer still remains unclear. Some data have indicated S100A14 is upregulated in several cancers, including ovarian, breast, and hepatocellular cancer [35,38].

We selected SGA instead of SOAPdenovo2[25] due to its high assembly quality and distinctly low memory consumption. Then, we conducted de novo assemble for the 185 newly sequenced Han Chinese genomes using all reads (see the Methods section). We first optimized the assemble parameters based on simulation data (Additional file 1: Supplementary methods and Table S2). As a result, the average size of the assembled 185 genomes was 2,720,566,5597,126,135bp and the average size of contigs N50 was 8042387bp (Additional file 1: Figure S1).