Sunday, June 14, 2009

Re: robots.txt with a 301 redirect

<<Anyway, your suggestion of removing the robots.txt or the Disallow: /
would lead to files stored at domain1.com getting indexed wouldn't
it ;) >>
Sort of.
The robots will explore the pages that are not disallowed. But, as soon as they start their exploration, they fall on the 301 redirect which in fact leads them to explore the new pages. So, very shortly, every robot that want to update its info on the "old pages" will take notice that its content has ben moved and stop indexing the "old content".
-- overtime human coming to your site from search engines answers will be led directly to new pages rather the old ones
-- any human visitor coming directly to an "old page" will be silently transferred to the "new page", wheteher this comes froim their own bookmarks or from some other 3rd party inlink
-- and of course, as stated initially, all robots coming directly to your old site, after having checked in robots.txt that it is not disallowed to go on this page, will try to access the old page and be redirected to new page.

Of course, I would remove any "sitemap: " line in robots.txt, but I would take care that the the robots.txt in the new site has has a "sitemap: " line pointing to an updatred sitemap.xml of pages of new site.

May I suggest that you:
1 - Run one or several "checksearches" which display results pointing to old pages, and save a local copy of these pages
2 - place the htaccess 301 redirects on the old site
3 - remove unneeded disallows from robots.txt on the old site, or remove this robots.txt
4 - Run and monitor your "checksearches" on a weekly basis to see how this is evolving over time?

2009/6/9 Ed Galligan <ed.galligan@gmail.com>

@Bernard
Ha. Of course, that would be easy wouldn't it. Seems we're kinda going
in circles here. Don't worry though, as I said this is not a huge
priority, something I'd really LIKE to get working, but don't
necessarily NEED. I'm inquiring here as a last resort before settling
for some comprimise.

Anyway, your suggestion of removing the robots.txt or the Disallow: /
would lead to files stored at domain1.com getting indexed wouldn't
it ;) which I'd ideally like to avoid.

I want urls on domain1.com that redirect to domain2.com to be indexed.
But I want to prevent urls on domain1.com that really lead to files on
domain1.com being indexed.

Impossible?

On Jun 8, 1:39 pm, Bernard Savonet <bernard.savo...@gmail.com> wrote:
> 2009/6/4 Ed Galligan <ed.galli...@gmail.com>
>
>
>
> > @Bernard
> > Unfortunately, it does not currently work fine. Google reads the
> > robots.txt file BEFORE attempting to follow the 301 - once it reads
> > the Disallow: / it then stops and never accesses the 301 at all,
> > meaning the second domain never gets indexed.
>
> B-)) so remove the disallow or the robots.txt
> Then the htaccess and redirect will work
>
>
>
>
>
>
>
> > On Jun 4, 12:09 pm, Bernard Savonet <bernard.savo...@gmail.com> wrote:
> > > 2009/6/2 Ed Galligan <ed.galli...@gmail.com>
>
> > > > I already have a generic regexp mod_rewrite 301 in .htaccess doing
> > > > this. Is that what you meant?
>
> > > Yes.
> > > So eveything should be fine.
>
> > > > On Jun 1, 9:41 pm, Bernard Savonet <bernard.savo...@gmail.com> wrote:
> > > > > 1- It's possible
>
> > > > > 2 - If you don't want to get mad, it would be better to have some
> > generic
> > > > > mechanism so that you 301 redirect each page to the correct page on
> > the
> > > > new
> > > > > domain.
>
> > > > > 2009/6/1 Ed Galligan <ed.galli...@gmail.com>
>
> > > > > > I'm trying to figure out how to do something that might just be
> > > > > > impossible. Before I give up, I'll see if anyone here can possibly
> > add
> > > > > > some insight.
>
> > > > > > I have two domains:
> > > > > > 1) domainone.com which I don't want indexed by search engines
> > > > > > 2) domaintwo.com which I do want indexed by search engines
>
> > > > > > Any content I have on domainone.com, I want to hide from search
> > > > > > engines, but then I have a 301 .htaccess redirect sending visitors
> > to
> > > > > > domaintwo.com for any addresses that don't match content on
> > > > > > domainone.com
>
> > > > > > So for example, say I have a file called privatephoto.jpg that's at
> > > > > >http://domainone.com/privatephoto.jpg
> > > > > > The I have another file called information.html at
> > > > > >http://domaintwo.com/information.html
>
> > > > > > I want to hidehttp://domainone.com/privatephoto.jpgfromGoogle, BUT
> > > > > > I want Google to be able to seehttp://
> > domainone.com/information.html
> > > > > > (which is a 301 redirect)
>
> > > > > > Is this completely impossible?
>
> > > > > --
> > > > > --------------
> > > > > Les peintures de Marine:http://markaonline.free.fr/accueil.htm
>
> > > --
> > > --------------
> > > Les peintures de Marine:http://markaonline.free.fr/accueil.htm
>
> --
> --------------
> Les peintures de Marine:http://markaonline.free.fr/accueil.htm





--
--------------
Les peintures de Marine: http://markaonline.free.fr/accueil.htm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Google Search Engine Optimization SEO Google - MSN - Yahoo" group.
To post to this group, send email to SEO1@googlegroups.com
To unsubscribe from this group, send email to SEO1+unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/SEO1?hl=en
-~----------~----~----~----~------~----~------~--~---

No comments:

Post a Comment